CN110753024A - Personalized mail re-filtering method in collective environment - Google Patents

Personalized mail re-filtering method in collective environment Download PDF

Info

Publication number
CN110753024A
CN110753024A CN201810822625.4A CN201810822625A CN110753024A CN 110753024 A CN110753024 A CN 110753024A CN 201810822625 A CN201810822625 A CN 201810822625A CN 110753024 A CN110753024 A CN 110753024A
Authority
CN
China
Prior art keywords
mail
filter
mails
filtering
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810822625.4A
Other languages
Chinese (zh)
Inventor
陈松灿
徐丹丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN201810822625.4A priority Critical patent/CN110753024A/en
Publication of CN110753024A publication Critical patent/CN110753024A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/42Mailbox-related aspects, e.g. synchronisation of mailboxes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0236Filtering by address, protocol, port number or service, e.g. IP-address or URL
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0245Filtering by information in the payload

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Transfer Between Computers (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Due to different interests and hobbies, the definitions of the spam by users are greatly different, so that realizing personalized spam filtering becomes an important subject of the research in the field of mail filtering at present. However, under the condition of complete personalization, the amount of tagged mails of a specific user is limited, and the problem of tag delay of a personalized filter also exists. Meanwhile, the mails received by users in the same group (school, college or company) environment have certain relevance, so that the information learned by the fully personalized mail filter is limited. When the mail is mistakenly filtered, the user has to manually modify the mail, which brings great inconvenience to the user experience. In order to effectively solve the problems, the invention provides a personalized mail re-filtering method in a collective environment, and realizes the functions of personalized mail filtering, wrong filtered mail automatic modification and the like.

Description

Personalized mail re-filtering method in collective environment
Technical Field
The invention belongs to a method in the field of information filtering, in particular to a personalized mail re-filtering method in a collective environment, which is mainly applied to the technical field of data mining to realize mail filtering.
Background
Although mail is one of popular communication tools, convenience is brought to life and work of people, a large amount of spam (spam) also seriously reduces the working efficiency of people, and particularly, the spam filtering becomes an essential part of a mail service system because the spam must be manually modified to be normal when the mail is filtered by mistake. The spam filtering technology identifies whether the current mail is normal according to the existing spam characteristics, wherein normal is normal (marked as 0), and otherwise spam (marked as 1). The general filter workflow is shown in figure 1. The spam filtering can be regarded as one of the problems of text-oriented two-classification, but is different from the general text classification because the spam filtering has great personalized difference, different users can have distinct classification results on the same mail, and the globally uniform binary filtering standard cannot meet the subjective judgment of all users on the mail. However, in the collective environment, there is a great deal of correlation and dependency between the mail received by users, which requires the design of filters that are weighted between individual characteristics and collective characteristics. Meanwhile, the mail is used as an online application, and with the continuous change of network culture, the characteristics of the junk mail and the interest points of the user can be changed, so that a dynamic environment is formed. The traditional spam filter learns based on a large corpus and then detects the unlabeled mail classes, and the hypothesis is that the mail training set and the test set data are subjected to the same distribution, but in the real situation, the hypothesis is not true under the dynamic environment, which brings great challenges to relevant researchers.
Mail filters are largely classified into two types according to the filtering range: a single generalized filter for all users and a personalized filter for a particular user. The former is usually arranged at a server side to filter mails of all users, and the filter learns the global unified concept of junk mails, so that the interest characteristics of individual users cannot be accurately reflected, and a lot of misjudgment situations exist. Therefore, spam filtering personalization is also becoming a primary task in the field of mail filtering. The personalized filter is arranged at the client, only the mails of the individual user are filtered, the current interest characteristics of the user are analyzed according to the feedback information of the user, and then the mails are filtered, so that the problem of serious misjudgment of the generalized filter is solved. In recent years, various mail filtering methods have been proposed by scholars at home and abroad.
Han et al propose a Relaxed Online Support Vector Machine (ROSVM) model, which significantly speeds up filter training at low cost through relaxation constraints, and which adopts a typical Online learning method Online SVM as a filter for identifying mail categories. Subsequently, Sun et al propose an active learning method based on misjudgment and low-certainty (MLC) based on ROSVM, that is, select a misjudged email and an email with an uncertain prediction result as a training data set, thereby reducing training cost.
Recently, in order to overcome the problem of filter performance reduction caused by the continuous change of the content of the junk mails and the individuation of the mail class judgment of the user, Sanghani et al propose a new individualized filtering method based on an incremental SVM, and a heuristic update attribute set is performed before the incremental SVM is introduced, so that the classification model effectively learns the changed data distribution. And (4) performing feature selection on the retraining samples by using Information Gain (IG) to generate a new attribute set, and replacing the attribute with low value of part of IG in the original attribute set. Although the above two methods alleviate the problem of mail misjudgment to a certain extent, the subjective judgment of the individual user on the mail category is not considered, and the judgment also changes along with the time. In an actual application scenario, a generalized filter needs to have robustness, and a personalized filter needs to have expandability. To address such challenges, Junejo et al design robust personalized filters based on local and global discriminant models. The method comprises the steps of respectively establishing multidimensional spaces of junk mail keywords and normal mail keywords by adopting marked training samples, projecting the multidimensional spaces to a two-dimensional space to obtain a local model, and obtaining global discrimination model parameters by minimizing the filtering error of a training set. The model can be used as a generalized filter, and can also be updated according to unmarked samples of different user-specific inboxes to serve as a personalized filter, so that the model is suitable for the state that the combined distribution of mails and labels changes along with different users and different times. Although the two methods utilize the personalized features of the user to improve the filtering accuracy, the problem of class prior difference and class imbalance is caused because the false filtering of mails of different mailboxes (an inbox and a garbage box) of a specific user is not considered in combination with the actual situation. Meanwhile, under the completely personalized condition, the amount of the marked mails of a specific user is limited, and the personalized filter has the problem of marking delay. In order to solve such problems, the present invention proposes a special personalized mail filtering method in a collective environment.
Disclosure of Invention
[ OBJECTS OF THE INVENTION ]
The existing mainstream mail filtering system still has the wrong filtering condition, such as normal mails stored in a garbage box, and the junk mails received by an inbox. The problem of misjudgment of the mails still remains to be solved in the field of mail filtering. The main causes of such problems can be summarized as follows. First, spammers constantly change the content characteristics of spam in order to avoid detection of filters, resulting in changes in data distribution over time. Secondly, whether the received mails are spam mails or not is related to the interest points of the users at the current stage, namely the interest points of one user can be changed at different time periods, and the mails of the same type can be individually marked according to subjective factors. The two cases correspond to conceptual drift. Finally, the judgment of the mail category by a specific user usually has subjective definition, so that the mail filtering error can be caused by the fact that the global uniform filtering standard irrelevant to the user is not consistent with the subjective definition of the user, and the probability of filtering error can be effectively reduced by combining the definitions of the collective and the individual to the junk mails.
For the concept drift problem in mail data streams, we formalize as follows:
(1) entering the same mailbox (inbox or garbage box), and the distribution of the data flow at different moments is changed:
Figure BSA0000167661980000031
wherein P (.) represents data distribution, x represents mail feature representation, y represents mail category, t1、t2Representing different time instants; because the spam maker is about to avoid the filter detection, the content of the mail will change continuously, so that the feature distribution at different time is different.
(2) At the same time, the data flow distribution difference between different mails is as follows:
Figure BSA0000167661980000032
wherein the content of the first and second substances,
Figure BSA0000167661980000033
Pi(.) represents inbox data distribution, Pg(.) represents a trash can data distribution. Generally, the number of normal mails in the inbox of the user is larger than that of the junk mails, and similarly, the junk mails in the inbox are more than the normal mails. And in severe cases, the problem of unbalance-like occurs: pi(y=0|x)>>Pg(y=0|x),Pi(y=1|x)<<Pg(y ═ 1| x). We refer to the two cases as "generalized virtual drift" in mail.
Different from the existing spam filter, in order to increase the diversity of spam samples and protect the privacy of users, users in the same group only share spam, and the accuracy of predicting the spam by the personalized filter is improved. In order to effectively solve the three problems and realize functions of Personalized Mail filtering, wrong Mail filtering automatic modification and the like, the invention provides a Client-Based Personalized Mail Re-filtering System (A Personalized Mail Re-filtering System Based on the Client in the collective Environment by combining rules and statistical methods. Most of the existing junk mail filters only perform online filtering on mail data streams, but do not consider the problems of difference and class imbalance of mail classes of different mailboxes in a priori, the filtering system firstly processes mails entering an inbox and a dustbin respectively, then designs two filters which learn with each other based on a multi-task learning principle to be used for filtering the mails of the inbox and the dustbin respectively, and automatically modifies wrongly-filtered mails. Meanwhile, in order to ensure the performance of the filter under the condition of user interest points and mail data distribution which change along with time, a multi-window learning framework combined with importance weighting is designed, so that the dynamic self-adaption of the filter is effectively realized.
[ technical solution ] A
In order to protect the privacy of each network user, all users of the same group, which are one of the scenes set by the invention, can independently emit respective junk mails so that other users can share the public information, and the diversity of the junk mails is increased for personalized filtering.
The invention comprises the following contents:
the quantity of users of the same group is fixed, once the junk mails are shared, whether the junk mails are repeated with the mails in the group junk box is detected, and if the junk mails are repeated, the reported rate of the mails is updated; otherwise, the mail is added to the collective trash.
Setting that the mails with higher reporting rate are successively put into a private garbage box of a specific user, and detecting whether the mails are junk mails or not by the Co-PRFC according to the user interest degree. If yes, throwing the garbage into a garbage can; otherwise, the mail is thrown into an inbox.
The invention mainly aims at the problem of wrong division, so that the situation of the (2) th point in generalized virtual drift can occur when two filters (a Filter _ junkbox and a Filter _ inbox) are adopted to respectively Filter a data stream of a garbage can and a data stream of an inbox.
As time goes on, the interest points of a specific user also change, so the invention designs a multi-window learning frame (with real mark windows: long window LW, short window SW; without real mark window: target window TW), detects whether interest changes through the prediction accuracy of sub-models L and S for the mail with long window and short window, and resets the L model by S if the interest changes. LW represents the content of all sample sets after the last model update, and SW stores a fixed number of samples in the near future, so when the error rate of L is lower than S, the interest point of the current user is stable, otherwise, the interest point of the user is changed in the near future.
The invention passes the nuclear density ratioDetecting whether the distribution of the mail content is changed, if so, re-learning the model S to adapt to new data distribution, and improving the accuracy of the filter; otherwise, S is not changed. To avoid calculating the data distribution PTW(x) Using kernel function to estimate its density ratio
Figure BSA0000167661980000052
Wherein N ismIs the size of the windows TW and SW,
Figure BSA0000167661980000053
is the parameter of the model and is,
Figure BSA0000167661980000054
is a basis function.
Generally, a machine learning method is adopted to filter junk mails, and the method needs to analyze, preprocess and vectorize mails, consumes a large amount of time, so that the Co-PRFC combines rules and a statistical method to filter junk mails, thereby reducing the computational complexity and shortening the filtering time. For the mail to be predicted, firstly detecting whether the sender of the mail is credible, if so, putting the mail into an inbox; otherwise, judging whether the mail is a normal mail according to whether the mail subject contains a're' or 'reply' field, if not, sequentially vectorizing the subject and the mail body, and judging the category (as shown in fig. 2). And respectively taking the subjects and texts of the vectorization of the garbage box data flow and the inbox data flow as input variables of a Filter _ junkbox and a Filter _ inbox.
[ PROBLEMS ] the present invention
Figure BSA0000167661980000055
We implemented the proposed filtration system Co-PRFC using the development tool Python. Models L and S corresponding to LW and SW in the filtering system are realized by adopting an integrated algorithm, and an SVM is used as a base learner. In order to obtain the optimal parameters, 10000 sample data are randomly selected from TREC 2006c, and private inbox data streams and garbage bin data streams with unbalanced classes are constructed according to the proportion. Meanwhile, experiments prove that the filtering performance of the filter can be improved by adopting multitask and utilizing collective environment. By taking TREC 2006c, TREC 2007p and SEWM2010 as experimental data, performance of the Co-PRFC is compared with that of the existing filter, and the filter provided by the people is verified to have a remarkable filtering effect. The method has certain popularization, and not only can be used for filtering mails, but also can be used for filtering information such as short messages and microblog comments.
Drawings
FIG. 1: main flow of filtering junk mail
FIG. 2: Co-PRFC predicted mail marking process
FIG. 3: Co-PRFC system framework
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will become apparent to those skilled in the art after reading the present invention and are intended to fall within the scope of the appended claims.
The framework of the invention is shown in fig. 3, users of the same collective network can autonomously send out spam in order to share information with each other. We set a collective user quantity M (we define 150), which remains unchanged. And (3) judging whether the junk mails sent out are in the collective junk mail box or not at first, if so, updating the reporting rate (namely judging the junk mails are junk mails by the current users), and if not, setting the initial reporting rate to be 1/M and then placing the junk mails into the collective junk box. The filtering system of a particular user occasionally accesses the collective trash, brings the reporting rate above 1/3, and introduces non-redundant mail into the private trash data stream for detection. The following is a Co-PRFC pseudo-code implementation;
inputting: sample with true mark
Figure BSA0000167661980000061
Sample without true mark
Figure BSA0000167661980000062
Initial position T of test mail email, LW after parsing0And the current position T1An acceptable error rate threshold ρ for the L model, a confidence threshold ξ for the prediction marker, an initialized Filter Filter _ inbox and Filter _ garpage.
And (3) outputting: prediction tag y of email.
Figure BSA0000167661980000063
Figure BSA0000167661980000071

Claims (4)

1. A method for re-filtering personalized mail in a corporate environment, comprising the steps of:
firstly, fixing the user quantity of the same group, once a junk mail is shared, firstly detecting whether the junk mail is repeated with the mail in the group garbage box, and if so, updating the reported rate of the mail; otherwise, the mail is added to the collective trash.
And secondly, setting that the mails with higher reporting rate are sequentially put into a private garbage box of a specific user, and detecting whether the mails are junk mails or not by the Co-PRFC according to the user interest degree. If yes, throwing the garbage into a garbage can; otherwise, the mail is thrown into an inbox.
And thirdly, a machine learning method is generally adopted to filter the junk mails, and the mails need to be analyzed, preprocessed by data, vectorized and the like, so that a large amount of time is consumed, the Co-PRFC is combined with rules and a statistical method to filter the junk mails, the calculation complexity is reduced, and the filtering time is shortened. For the mail to be predicted, firstly detecting whether the sender of the mail is credible, if so, putting the mail into an inbox; otherwise, judging whether the mail is a normal mail according to whether the mail subject contains a're' or 'reply' field, if not, sequentially vectorizing the subject and the mail body, and judging the category (as shown in fig. 2).
And fourthly, aiming at the problem that the error division is caused, two filters (a Filter _ junkbox and a Filter _ inbox) are adopted to respectively Filter the data stream of the garbage box and the data stream of the inbox. And respectively taking the subjects and texts of the vectorization of the garbage box data flow and the inbox data flow as input variables of a Filter _ junkbox and a Filter _ inbox. But the situation of the (2) th point in the generalized virtual drift can occur in the separated filtering, and the invention is based on a Multi-task Learning (Multi-task Learning) theory, and by taking the reference of the mutual feature description, the two filters are mutually learned and respectively filtered to relieve the class imbalance problem.
And fifthly, the interest points of the specific user also change along with the time, so that the invention designs a multi-window learning frame (with real mark windows: long window LW and short window SW; without real mark windows: target window TW), detects whether the interest of the mail with long and short windows changes through the prediction accuracy of the sub-models L and S, and resets the L model by S if the interest of the mail with long and short windows changes. LW represents the content of all sample sets after the last model update, and SW stores a fixed number of samples in the near future, so when the error rate of L is lower than S, the interest point of the current user is stable, otherwise, the interest point of the user is changed in the near future.
Sixthly, the invention passes the nuclear density ratioDetecting whether the distribution of the mail content is changed, if so, re-learning the model S to adapt to new data distribution, and improving the accuracy of the filter; otherwise, S is not changed. To avoid calculationsData distribution PTW(x) Using kernel function to estimate its density ratio
Figure FSA0000167661970000012
Wherein N ismIs the size of the windows TW and SW,
Figure FSA0000167661970000013
is the parameter of the model and is,
Figure FSA0000167661970000014
is a basis function.
2. The problem of sharing spam in a corporate environment as in the first and second steps of claim 1, wherein the diversity of spam is increased for a particular user while preserving user privacy, improving the accuracy with which personalized filters predict spam. In the first step, in order to ensure that the mails in the collective junk mail box are not redundant and to judge whether the mails are acknowledged junk mails or not, the reported probability of each mail is marked; for a particular user, the recognized spam is not necessarily spam, but may be normal, so the personalized filter is used to determine the mail category in the collective spam for subsequent filter learning.
3. The use of two filters according to the fourth step of claim 1 is characterized by using a new filtering method to further alleviate the problem of wrong filtering of system filters and to realize automatic correction of wrong filtered mails. Aiming at the problem of class imbalance of data streams of two mailboxes (a private inbox and a private garbage box), the invention learns two filters mutually and respectively filters the two filters based on a multi-task learning theory.
4. The multi-window frame and kernel density ratio of claims 1 in the fifth and sixth steps is characterized by a good mitigation of concept drift. Comparing the filter precision of different windows through multi-window frame design to detect whether the interest degree of a user changes, and if so, adjusting the filter; the core density ratio can determine whether the distribution of the current mail content is drifted, and if the distribution is drifted, the filter is updated. The combination of the two has great alleviation effect on the difficulty of mail filtering.
CN201810822625.4A 2018-07-23 2018-07-23 Personalized mail re-filtering method in collective environment Pending CN110753024A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810822625.4A CN110753024A (en) 2018-07-23 2018-07-23 Personalized mail re-filtering method in collective environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810822625.4A CN110753024A (en) 2018-07-23 2018-07-23 Personalized mail re-filtering method in collective environment

Publications (1)

Publication Number Publication Date
CN110753024A true CN110753024A (en) 2020-02-04

Family

ID=69275581

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810822625.4A Pending CN110753024A (en) 2018-07-23 2018-07-23 Personalized mail re-filtering method in collective environment

Country Status (1)

Country Link
CN (1) CN110753024A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117151668A (en) * 2023-10-30 2023-12-01 太平金融科技服务(上海)有限公司 Automatic mail cleaning method and device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1976323A (en) * 2006-12-12 2007-06-06 华南理工大学 Spam mail identify method based on interest cognition and system thereof
CN101018215A (en) * 2007-03-13 2007-08-15 杭州华为三康技术有限公司 Mail filtering system and mail filtering method
CN101043466A (en) * 2006-03-21 2007-09-26 宏碁股份有限公司 Information pickup method and hand-hold mobile communication device using the same
CN101309232A (en) * 2007-05-18 2008-11-19 鸿富锦精密工业(深圳)有限公司 Mail filtering system and method
CN104980335A (en) * 2014-08-14 2015-10-14 腾讯科技(深圳)有限公司 Method and system for processing received mails of email
CN106100973A (en) * 2016-06-07 2016-11-09 中国石油大学(华东) A kind of personalized rubbish mail filtering method based on node similarity and defecator
CN107291765A (en) * 2016-04-05 2017-10-24 南京航空航天大学 The clustering method of processing missing data is planned based on DC
US20170374002A1 (en) * 2005-04-14 2017-12-28 TJ2Z Patent Licensing and Tech Transfer, LLC Method and apparatus for storing email messages

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170374002A1 (en) * 2005-04-14 2017-12-28 TJ2Z Patent Licensing and Tech Transfer, LLC Method and apparatus for storing email messages
CN101043466A (en) * 2006-03-21 2007-09-26 宏碁股份有限公司 Information pickup method and hand-hold mobile communication device using the same
CN1976323A (en) * 2006-12-12 2007-06-06 华南理工大学 Spam mail identify method based on interest cognition and system thereof
CN101018215A (en) * 2007-03-13 2007-08-15 杭州华为三康技术有限公司 Mail filtering system and mail filtering method
CN101309232A (en) * 2007-05-18 2008-11-19 鸿富锦精密工业(深圳)有限公司 Mail filtering system and method
CN104980335A (en) * 2014-08-14 2015-10-14 腾讯科技(深圳)有限公司 Method and system for processing received mails of email
CN107291765A (en) * 2016-04-05 2017-10-24 南京航空航天大学 The clustering method of processing missing data is planned based on DC
CN106100973A (en) * 2016-06-07 2016-11-09 中国石油大学(华东) A kind of personalized rubbish mail filtering method based on node similarity and defecator

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王永贵等: "基于改进聚类和矩阵分解的协同过滤推荐算法", 《计算机应用》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117151668A (en) * 2023-10-30 2023-12-01 太平金融科技服务(上海)有限公司 Automatic mail cleaning method and device, electronic equipment and storage medium
CN117151668B (en) * 2023-10-30 2024-01-19 太平金融科技服务(上海)有限公司 Automatic mail cleaning method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Li et al. Multi-window based ensemble learning for classification of imbalanced streaming data
CN108717408B (en) Sensitive word real-time monitoring method, electronic equipment, storage medium and system
Basavaraju et al. A novel method of spam mail detection using text based clustering approach
Ma et al. A comparative approach to Naïve Bayes classifier and support vector machine for email spam classification
CN101345720B (en) Junk mail classification method based on partial match estimation
CN103473218B (en) A kind of E-mail classification method and device thereof
CN105447505B (en) A kind of multi-level important email detection method
CN109889436B (en) Method for discovering spammer in social network
CN109902740B (en) Re-learning industrial control intrusion detection method based on multi-algorithm fusion parallelism
CN106599913A (en) Cluster-based multi-label imbalance biomedical data classification method
CN112199608A (en) Social media rumor detection method based on network information propagation graph modeling
CN109145114B (en) Social network event detection method based on Kleinberg online state machine
Gu et al. [Retracted] Application of Fuzzy Decision Tree Algorithm Based on Mobile Computing in Sports Fitness Member Management
CN115687925A (en) Fault type identification method and device for unbalanced sample
CN111046171B (en) Emotion discrimination method based on fine-grained labeled data
WO2020135054A1 (en) Method, device and apparatus for video recommendation and storage medium
CN107533574A (en) Email relationship finger system based on random index pattern match
CN110753024A (en) Personalized mail re-filtering method in collective environment
Carmona-Cejudo et al. Using gnusmail to compare data stream mining methods for on-line email classification
CN112668633A (en) Adaptive graph migration learning method based on fine granularity field
Salehi et al. Hybrid simple artificial immune system (SAIS) and particle swarm optimization (PSO) for spam detection
CN116578708A (en) Paper data name disambiguation algorithm based on graph neural network
CN116561639A (en) Multi-mode data emotion analysis method for open source information
Kaur et al. E-mail spam detection using refined mlp with feature selection
CN115964478A (en) Network attack detection method, model training method and device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200204

WD01 Invention patent application deemed withdrawn after publication