CN110912917A - Malicious URL detection method and system - Google Patents
Malicious URL detection method and system Download PDFInfo
- Publication number
- CN110912917A CN110912917A CN201911207542.5A CN201911207542A CN110912917A CN 110912917 A CN110912917 A CN 110912917A CN 201911207542 A CN201911207542 A CN 201911207542A CN 110912917 A CN110912917 A CN 110912917A
- Authority
- CN
- China
- Prior art keywords
- url
- malicious
- sample set
- labeled
- data set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/1483—Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Computer Hardware Design (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Probability & Statistics with Applications (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention provides a malicious URL detection method and system. The malicious URL detection method comprises the following steps: acquiring a URL to-be-analyzed data set and acquiring a malicious URL training sample set; utilizing a malicious URL training sample set to train an SVM (support vector machine) to classify a URL to-be-analyzed data set to obtain a malicious URL data set and a URL to-be-labeled data set; clustering the URL data set to be labeled by adopting a clustering algorithm so as to obtain a URL sample set to be labeled; labeling the URL sample set to be labeled according to a judgment result of whether the URL sample set has a malicious meaning, so that the URL sample set to be labeled is divided into a labeled malicious URL sample set and a non-labeled URL sample set; combining the marked malicious URL sample set and the malicious URL training sample set in a mode of collecting and solving a union to obtain an updated malicious URL training sample set; and subtracting the marked malicious URL sample set from the URL data set to be marked to obtain an updated URL test data set. The malicious URL detection method and the system are novel in design and high in practicability.
Description
Technical Field
The invention relates to the technical field of network information security, in particular to a malicious URL detection method and system.
Background
With the rapid development of the internet, more and more malicious URL attacks appear, and the network security is seriously threatened. Conventional URL attack detection systems are primarily through the use of blacklists or rule lists. These lists or rule lists will become longer and longer, and it is not practical to protect against all attacks in these ways. More seriously, these methods are difficult to detect potential threats and it is difficult for network security engineers to effectively discover new malicious URL attacks.
To improve the generalization ability of the algorithm, many researchers have adopted a machine learning-based approach to accomplish this task. These methods are mainly divided into two categories: firstly, in an unsupervised mode, such as an anomaly detection technology, the method does not need to label data; however, the requirement of the model for the input features is far higher than that of a general supervised model, and the performance of the top of the score is difficult to maintain under the condition of a slightly larger number of features. And secondly, in a supervision mode, manual labeling is carried out based on human business experience, and then supervised learning is carried out based on labeling to obtain a model, but the labeling cost is high, and the labeling experts have artificial subjectivity errors, so that the accuracy is reduced.
Supervised learning methods generally achieve greater generalization capabilities when labeled data is available. However, in many cases, it is difficult to obtain accurate annotation data. More than that time, we may get only a small fraction of malicious URLs and a large number of unlabeled URL samples, lacking sufficiently reliable negative examples, which means we cannot directly use the above-mentioned machine learning algorithm. If we simply resolve it unsupervised, then the annotation information for known malicious URLs is difficult to exploit and may not achieve satisfactory performance.
Disclosure of Invention
The invention provides a malicious URL detection method and system aiming at the technical problems.
The technical scheme provided by the invention is as follows:
the invention provides a malicious URL detection method, which comprises the following steps:
s1, acquiring a URL to-be-analyzed data set and acquiring a malicious URL training sample set; utilizing a malicious URL training sample set to train an SVM (support vector machine) to classify a URL to-be-analyzed data set to obtain a malicious URL data set and a URL to-be-labeled data set;
s2, clustering the URL data set to be labeled by adopting a clustering algorithm to obtain a URL sample set to be labeled; labeling the URL sample set to be labeled according to a judgment result of whether the URL sample set has a malicious meaning, so that the URL sample set to be labeled is divided into a labeled malicious URL sample set and a non-labeled URL sample set; combining the marked malicious URL sample set and the malicious URL training sample set in a mode of collecting and solving a union to obtain an updated malicious URL training sample set; subtracting the marked malicious URL sample set from the URL data set to be marked to obtain an updated URL test data set;
and step S3, training the SVM support vector machine by using the updated malicious URL training sample set to classify the updated URL test data set and outputting a label-free URL data set.
In the malicious URL detection method, the clustering algorithm is a K-means clustering algorithm, a mean shift clustering algorithm, a density-based clustering algorithm, a Gaussian mixture model-based expectation maximization clustering algorithm, a coacervation level clustering algorithm or a graph group detection method.
In the malicious URL detection method, step S2 adopts a Mini Batch K mean algorithm to cluster URL data sets to be labeled, so as to obtain URL sample sets to be labeled.
In the above malicious URL detection method of the present invention, step S3 further includes: and subtracting the unmarked URL data set from the URL data set to be analyzed, thereby obtaining a final malicious URL data set.
The invention also provides a malicious URL detection system, which comprises:
the active learning module is used for acquiring a URL to-be-analyzed data set and acquiring a malicious URL training sample set; utilizing a malicious URL training sample set to train an SVM (support vector machine) to classify a URL to-be-analyzed data set to obtain a malicious URL data set and a URL to-be-labeled data set;
the labeling module is used for clustering the URL data set to be labeled by adopting a clustering algorithm so as to obtain a URL sample set to be labeled; labeling the URL sample set to be labeled according to a judgment result of whether the URL sample set has a malicious meaning, so that the URL sample set to be labeled is divided into a labeled malicious URL sample set and a non-labeled URL sample set; combining the marked malicious URL sample set and the malicious URL training sample set in a mode of collecting and solving a union to obtain an updated malicious URL training sample set; subtracting the marked malicious URL sample set from the URL data set to be marked to obtain an updated URL test data set;
and the output module is used for training the SVM support vector machine to classify the updated URL test data set by using the updated malicious URL training sample set and outputting the unmarked URL data set.
In the malicious URL detection system, the clustering algorithm is a K-means clustering algorithm, a mean shift clustering algorithm, a density-based clustering algorithm, a Gaussian mixture model-based expectation maximization clustering algorithm, a coacervation level clustering algorithm or a graph group detection method.
In the malicious URL detection system, the labeling module is further used for clustering the URL data sets to be labeled by adopting a Mini Batch K mean algorithm so as to obtain URL sample sets to be labeled;
in the malicious URL detection system, the output module is used for training the SVM support vector machine to classify the updated URL test data set by using the updated malicious URL training sample set and outputting the unmarked URL data set.
The malicious URL detection method and the system can effectively find potential malicious URL attacks, can be used as auxiliary deployment of the existing system, and can also be used for helping network security engineers to effectively find potential attack modes, so that the potential malicious URL attacks can be quickly updated to the existing system. The malicious URL detection method and the system are novel in design and high in practicability.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
fig. 1 is a flowchart illustrating a malicious URL detection method of step S1 according to a preferred embodiment of the present invention;
fig. 2 is a schematic diagram illustrating a change of a processing result of step S1 of the malicious URL detection method illustrated in fig. 1;
fig. 3 is a schematic diagram illustrating a processing result change of the malicious URL detection method according to the preferred embodiment of the present invention;
fig. 4 is a functional block diagram of a malicious URL detection system according to a preferred embodiment of the present invention.
Detailed Description
The technical problem to be solved by the invention is as follows: when URL detection is performed, only a small part of malicious URLs and a large number of unlabeled URL samples are usually obtained, and a sufficiently reliable negative sample is lacking, which means that we cannot directly use a conventional machine learning algorithm. If we simply resolve it unsupervised, then the annotation information for known malicious URLs is difficult to exploit and may not achieve satisfactory performance. The technical idea of the invention for solving the technical problem is as follows: a malicious URL detection method and a system are constructed, and Active Learning (AL for short) is combined with semi-supervised (PU for short). Under the condition that the workload of manual labeling is limited, a malicious URL detection model is developed for a URL data set, and under the same accuracy rate, compared with an unsupervised model and a semi-supervised model, the malicious URL identification amount is greatly improved.
In order to make the technical purpose, technical solutions and technical effects of the present invention more clear and facilitate those skilled in the art to understand and implement the present invention, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.
The preferred embodiment of the invention provides a malicious URL detection method, which comprises the following steps:
s1, acquiring a URL to-be-analyzed data set and acquiring a malicious URL training sample set; utilizing a malicious URL training sample set to train an SVM (support vector machine) to classify a URL to-be-analyzed data set to obtain a malicious URL data set and a URL to-be-labeled data set;
the method adopts an active learning method, and marks the URL to-be-analyzed data set by taking the malicious URL training sample set as a label. Preferably, in the process of classifying the data set to be analyzed of the URL by training the SVM support vector machine with the malicious URL training sample set, a data labeling expert is further used for supervision and optimization iteration to ensure the accuracy of the label. For example, in fig. 1, it is assumed that the URL data set to be analyzed is original unlabeled data x1, x2, and x3 … …, the SVM support vector machine is an Active Learning classifier, the original unlabeled data x1, x2, and x3 … … are labeled by the Active Learning classifier, and in the labeling process, a data labeling expert performs supervision and optimization iteration.
The scheme involved in step S1 does not limit the specific type of the Active Learning classifier in fig. 1, and supervised classification, in which a URL to-be-analyzed data set is directly subjected to secondary classification according to a malicious URL training sample set, is the simplest and most direct method. In order to improve the classification efficiency and accuracy, the malicious URL detection method of the present invention introduces step S2 and step S3.
S2, clustering the URL data set to be labeled by adopting a clustering algorithm to obtain a URL sample set to be labeled; labeling the URL sample set to be labeled according to a judgment result of whether the URL sample set has a malicious meaning, so that the URL sample set to be labeled is divided into a labeled malicious URL sample set and a non-labeled URL sample set; combining the marked malicious URL sample set and the malicious URL training sample set in a mode of collecting and solving a union to obtain an updated malicious URL training sample set; subtracting the marked malicious URL sample set from the URL data set to be marked to obtain an updated URL test data set;
in this step, the clustering algorithm is a K-means clustering algorithm, a mean shift clustering algorithm, a density-based clustering algorithm, an expectation maximization clustering algorithm based on a gaussian mixture model, a coacervation level clustering algorithm, or a graph group detection method.
Preferably, in this embodiment, in this step, a Mini Batch K mean algorithm is adopted to cluster the URL data set to be labeled, so as to obtain a URL sample set to be labeled;
the Mini Batch K-means algorithm is a clustering model which can keep clustering accuracy as much as possible and can greatly reduce computing time, the Mini Batch is adopted to reduce the computing time, and meanwhile, an objective function is tried to be optimized. The MiniBatch refers to a data subset which is randomly extracted each time the algorithm is trained, and the randomly selected data are adopted for training, so that the calculation time is greatly reduced, and the convergence time of the K-means algorithm is reduced.
Specifically, the sampling mode is based on the uncertainties & Diversity standard, that is, a sample set with the most uncertain current model and rich Diversity is taken as much as possible. The specific process is as follows: 1) scoring the new data Dnew by using a current model; 2) extracting a plurality of white samples with most uncertain models to form Duncertain, wherein the uncertainty is measured based on model scoring; 3) and (3) carrying out K-Means (K-Means) clustering on Duncertain, and taking out a plurality of most uncertain samples in each class to form a URL sample set to be labeled.
Furthermore, the URL sample set to be labeled is labeled according to the judgment result of whether the URL sample set is malicious or not, so that the URL sample set is divided into a labeled malicious URL sample set and a non-labeled URL sample set. And for the samples which cannot be determined by the expert in the URL sample set to be labeled, summarizing the samples in the URL sample set without being labeled in case that the samples are not labeled.
And step S3, training the SVM support vector machine by using the updated malicious URL training sample set to classify the updated URL test data set and outputting a label-free URL data set.
As shown in fig. 2, the expert may mark for multiple times, gradually expand the L set, and continuously improve the performance of the expert when learning the L set for multiple times. In the malicious URL detection method of the present invention, as shown in fig. 3, the expert labels and expands the P set for multiple times, and updates the learning at each iteration.
The malicious URL detection method can grow a new model on the basis of the existing knowledge (namely the malicious URL training sample set), so that black sample labeling (namely the malicious URL data set) with high accuracy and low recall rate can be brought by the existing knowledge. The malicious URL detection method provided by the invention can be divided into two steps, wherein the first step is step S1, a sample of a malicious URL training sample set is taken as spy and mixed into a URL data set to be analyzed, and multiple rounds of EM iteration are carried out, the second step comprises step S2 and step S3, all samples with the scores smaller than the score of a 10% quantile model in spy in the URL data set to be analyzed are marked by investigating the score distribution of spy samples, and are summarized in an updated URL test data set, and multiple rounds of EM iteration are carried out based on the updated URL test data set.
EM may be understood as an improved method of mle (maximum Likelihood estimation) in the presence of hidden variables, where the missing values are filled in step E, and step M iterates based on the last filling result, so that the final model is generated after many rounds.
Further, in this embodiment, the Active Learning classifier adopts a GBRT (gradient boosting regression tree) based classifier, so that after the malicious URL detection method is executed once, a GBRT model is generated.
Further, step S3 further includes: and subtracting the unmarked URL data set from the URL data set to be analyzed, thereby obtaining a final malicious URL data set.
Further, as shown in fig. 4, fig. 4 is a functional module diagram of a malicious URL detection system according to a preferred embodiment of the present invention. Specifically, malicious URL detection system includes:
the active learning module 100 is configured to obtain a URL data set to be analyzed and obtain a malicious URL training sample set; utilizing a malicious URL training sample set to train an SVM (support vector machine) to classify a URL to-be-analyzed data set to obtain a malicious URL data set and a URL to-be-labeled data set;
the active learning module 100 marks the URL to-be-analyzed data set by using a malicious URL training sample set as a tag by using an active learning method. Preferably, in the process of classifying the data set to be analyzed of the URL by training the SVM support vector machine with the malicious URL training sample set, a data labeling expert is further used for supervision and optimization iteration to ensure the accuracy of the label. For example, in fig. 1, it is assumed that the URL data set to be analyzed is original unlabeled data x1, x2, and x3 … …, the SVM support vector machine is an Active Learning classifier, the original unlabeled data x1, x2, and x3 … … are labeled by the Active Learning classifier, and in the labeling process, a data labeling expert performs supervision and optimization iteration.
The Active Learning module 100 does not limit the specific types of the Active Learning classifiers in fig. 1 during the work, and supervised classification, in which a URL to be analyzed is directly subjected to secondary classification according to a malicious URL training sample set, is the simplest and most direct method.
The labeling module 200 is configured to cluster the URL data set to be labeled by using a clustering algorithm, so as to obtain a URL sample set to be labeled; labeling the URL sample set to be labeled according to a judgment result of whether the URL sample set has a malicious meaning, so that the URL sample set to be labeled is divided into a labeled malicious URL sample set and a non-labeled URL sample set; combining the marked malicious URL sample set and the malicious URL training sample set in a mode of collecting and solving a union to obtain an updated malicious URL training sample set; subtracting the marked malicious URL sample set from the URL data set to be marked to obtain an updated URL test data set;
the clustering algorithm is a K-means clustering algorithm, a mean shift clustering algorithm, a density-based clustering algorithm, an expectation maximization clustering algorithm based on a Gaussian mixture model, a coacervation level clustering algorithm or a graph group detection method.
Preferably, in this embodiment, the tagging module 200 is further configured to cluster the URL data set to be tagged by using a Mini Batch K-means algorithm, so as to obtain a URL sample set to be tagged;
the Mini Batch K-means algorithm is a clustering model which can keep clustering accuracy as much as possible and can greatly reduce computing time, the Mini Batch is adopted to reduce the computing time, and meanwhile, an objective function is tried to be optimized. The MiniBatch refers to a data subset which is randomly extracted each time the algorithm is trained, and the randomly selected data are adopted for training, so that the calculation time is greatly reduced, and the convergence time of the K-means algorithm is reduced.
Specifically, the sampling mode is based on the uncertainties & Diversity standard, that is, a sample set with the most uncertain current model and rich Diversity is taken as much as possible. The specific process is as follows: 1) scoring the new data Dnew by using a current model; 2) extracting a plurality of white samples with most uncertain models to form Duncertain, wherein the uncertainty is measured based on model scoring; 3) and (3) carrying out K-Means (K-Means) clustering on Duncertain, and taking out a plurality of most uncertain samples in each class to form a URL sample set to be labeled.
Furthermore, the URL sample set to be labeled is labeled according to the judgment result of whether the URL sample set is malicious or not, so that the URL sample set is divided into a labeled malicious URL sample set and a non-labeled URL sample set. And for the samples which cannot be determined by the expert in the URL sample set to be labeled, summarizing the samples in the URL sample set without being labeled in case that the samples are not labeled.
And the output module 300 is configured to train the SVM support vector machine to classify the updated URL test data set by using the updated malicious URL training sample set, and output a non-labeled URL data set.
It is understood that the output module 300 is further configured to subtract the unmarked URL data set from the data set to be analyzed, so as to obtain a final malicious URL data set.
The malicious URL detection method and the system can effectively find potential malicious URL attacks, can be used as auxiliary deployment of the existing system, and can also be used for helping network security engineers to effectively find potential attack modes, so that the potential malicious URL attacks can be quickly updated to the existing system. The malicious URL detection method and the system are novel in design and high in practicability.
It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.
Claims (8)
1. A malicious URL detection method is characterized by comprising the following steps:
s1, acquiring a URL to-be-analyzed data set and acquiring a malicious URL training sample set; utilizing a malicious URL training sample set to train an SVM (support vector machine) to classify a URL to-be-analyzed data set to obtain a malicious URL data set and a URL to-be-labeled data set;
s2, clustering the URL data set to be labeled by adopting a clustering algorithm to obtain a URL sample set to be labeled; labeling the URL sample set to be labeled according to a judgment result of whether the URL sample set has a malicious meaning, so that the URL sample set to be labeled is divided into a labeled malicious URL sample set and a non-labeled URL sample set; combining the marked malicious URL sample set and the malicious URL training sample set in a mode of collecting and solving a union to obtain an updated malicious URL training sample set; subtracting the marked malicious URL sample set from the URL data set to be marked to obtain an updated URL test data set;
and step S3, training the SVM support vector machine by using the updated malicious URL training sample set to classify the updated URL test data set and outputting a label-free URL data set.
2. The malicious URL detection method according to claim 1, wherein the clustering algorithm is a K-means clustering algorithm, a mean shift clustering algorithm, a density-based clustering algorithm, a Gaussian mixture model-based expectation-maximization clustering algorithm, a coacervation hierarchy clustering algorithm, or a graph community detection method.
3. The method according to claim 2, wherein in step S2, the Mini BatchK mean algorithm is adopted to cluster the URL data sets to be labeled, so as to obtain the URL sample sets to be labeled.
4. The malicious URL detection method according to claim 1, wherein the step S3 further comprises: and subtracting the unmarked URL data set from the URL data set to be analyzed, thereby obtaining a final malicious URL data set.
5. A malicious URL detection system, comprising:
the active learning module (100) is used for acquiring a URL to-be-analyzed data set and acquiring a malicious URL training sample set; utilizing a malicious URL training sample set to train an SVM (support vector machine) to classify a URL to-be-analyzed data set to obtain a malicious URL data set and a URL to-be-labeled data set;
the labeling module (200) is used for clustering the URL data sets to be labeled by adopting a clustering algorithm so as to obtain URL sample sets to be labeled; labeling the URL sample set to be labeled according to a judgment result of whether the URL sample set has a malicious meaning, so that the URL sample set to be labeled is divided into a labeled malicious URL sample set and a non-labeled URL sample set; combining the marked malicious URL sample set and the malicious URL training sample set in a mode of collecting and solving a union to obtain an updated malicious URL training sample set; subtracting the marked malicious URL sample set from the URL data set to be marked to obtain an updated URL test data set;
and the output module (300) is used for training the SVM support vector machine to classify the updated URL test data set by using the updated malicious URL training sample set and outputting the unmarked URL data set.
6. The malicious URL detection system according to claim 5, wherein the clustering algorithm is a K-means clustering algorithm, a mean shift clustering algorithm, a density-based clustering algorithm, a Gaussian mixture model-based expectation-maximization clustering algorithm, a coacervation hierarchy clustering algorithm, or a graph community detection method.
7. The malicious URL detection system according to claim 6, wherein the labeling module (200) is further configured to cluster the URL data sets to be labeled by using a Mini Batch K-means algorithm, so as to obtain URL sample sets to be labeled.
8. The malicious URL detection system according to claim 5, wherein the output module (300) is configured to train an SVM support vector machine with the updated malicious URL training sample set to classify the updated URL test data set and output the unlabeled URL data set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911207542.5A CN110912917A (en) | 2019-11-29 | 2019-11-29 | Malicious URL detection method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911207542.5A CN110912917A (en) | 2019-11-29 | 2019-11-29 | Malicious URL detection method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110912917A true CN110912917A (en) | 2020-03-24 |
Family
ID=69821092
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911207542.5A Pending CN110912917A (en) | 2019-11-29 | 2019-11-29 | Malicious URL detection method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110912917A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111523620A (en) * | 2020-07-03 | 2020-08-11 | 北京每日优鲜电子商务有限公司 | Dynamic adjustment method and commodity verification method for commodity identification model |
CN111680742A (en) * | 2020-06-04 | 2020-09-18 | 甘肃电力科学研究院 | Attack data labeling method applied to new energy plant station network security field |
CN112615861A (en) * | 2020-12-17 | 2021-04-06 | 赛尔网络有限公司 | Malicious domain name identification method and device, electronic equipment and storage medium |
CN114553496A (en) * | 2022-01-28 | 2022-05-27 | 中国科学院信息工程研究所 | Malicious domain name detection method and device based on semi-supervised learning |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102176701A (en) * | 2011-02-18 | 2011-09-07 | 哈尔滨工业大学 | Active learning based network data anomaly detection method |
CN103150369A (en) * | 2013-03-07 | 2013-06-12 | 人民搜索网络股份公司 | Method and device for identifying cheat web-pages |
CN104992184A (en) * | 2015-07-02 | 2015-10-21 | 东南大学 | Multiclass image classification method based on semi-supervised extreme learning machine |
CN109831460A (en) * | 2019-03-27 | 2019-05-31 | 杭州师范大学 | A kind of Web attack detection method based on coorinated training |
WO2019109743A1 (en) * | 2017-12-07 | 2019-06-13 | 阿里巴巴集团控股有限公司 | Url attack detection method and apparatus, and electronic device |
CN110413924A (en) * | 2019-07-18 | 2019-11-05 | 广东石油化工学院 | A kind of Web page classification method of semi-supervised multiple view study |
US20190349399A1 (en) * | 2017-10-31 | 2019-11-14 | Guangdong University Of Technology | Character string classification method and system, and character string classification device |
-
2019
- 2019-11-29 CN CN201911207542.5A patent/CN110912917A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102176701A (en) * | 2011-02-18 | 2011-09-07 | 哈尔滨工业大学 | Active learning based network data anomaly detection method |
CN103150369A (en) * | 2013-03-07 | 2013-06-12 | 人民搜索网络股份公司 | Method and device for identifying cheat web-pages |
CN104992184A (en) * | 2015-07-02 | 2015-10-21 | 东南大学 | Multiclass image classification method based on semi-supervised extreme learning machine |
US20190349399A1 (en) * | 2017-10-31 | 2019-11-14 | Guangdong University Of Technology | Character string classification method and system, and character string classification device |
WO2019109743A1 (en) * | 2017-12-07 | 2019-06-13 | 阿里巴巴集团控股有限公司 | Url attack detection method and apparatus, and electronic device |
CN109831460A (en) * | 2019-03-27 | 2019-05-31 | 杭州师范大学 | A kind of Web attack detection method based on coorinated training |
CN110413924A (en) * | 2019-07-18 | 2019-11-05 | 广东石油化工学院 | A kind of Web page classification method of semi-supervised multiple view study |
Non-Patent Citations (2)
Title |
---|
YA-LIN ZHANG, LONGFEI LI, JUN ZHOU, ET AL: "POSTER: A PU Learning based System for PotentialMalicious URL Detection", 《ACM》 * |
刘露,彭涛,左万利, 戴耀康: "一种基于聚类的 PU 主动文本分类方法", 《软件学报》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111680742A (en) * | 2020-06-04 | 2020-09-18 | 甘肃电力科学研究院 | Attack data labeling method applied to new energy plant station network security field |
CN111523620A (en) * | 2020-07-03 | 2020-08-11 | 北京每日优鲜电子商务有限公司 | Dynamic adjustment method and commodity verification method for commodity identification model |
CN111523620B (en) * | 2020-07-03 | 2020-10-20 | 北京每日优鲜电子商务有限公司 | Dynamic adjustment method and commodity verification method for commodity identification model |
CN112615861A (en) * | 2020-12-17 | 2021-04-06 | 赛尔网络有限公司 | Malicious domain name identification method and device, electronic equipment and storage medium |
CN114553496A (en) * | 2022-01-28 | 2022-05-27 | 中国科学院信息工程研究所 | Malicious domain name detection method and device based on semi-supervised learning |
CN114553496B (en) * | 2022-01-28 | 2022-11-15 | 中国科学院信息工程研究所 | Malicious domain name detection method and device based on semi-supervised learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110912917A (en) | Malicious URL detection method and system | |
CN107067025B (en) | Text data automatic labeling method based on active learning | |
US7570816B2 (en) | Systems and methods for detecting text | |
CN109871954B (en) | Training sample generation method, abnormality detection method and apparatus | |
Kosmidis et al. | Machine learning and images for malware detection and classification | |
CN105897517A (en) | Network traffic abnormality detection method based on SVM (Support Vector Machine) | |
CN111126576B (en) | Deep learning training method | |
CN107943856A (en) | A kind of file classification method and system based on expansion marker samples | |
CN111259219B (en) | Malicious webpage identification model establishment method, malicious webpage identification method and malicious webpage identification system | |
CN113489685B (en) | Secondary feature extraction and malicious attack identification method based on kernel principal component analysis | |
CN111222471A (en) | Zero sample training and related classification method based on self-supervision domain perception network | |
CN108446559A (en) | A kind of recognition methods of APT tissue and device | |
Fang et al. | Sparse similarity metric learning for kinship verification | |
CN103942749A (en) | Hyperspectral ground feature classification method based on modified cluster hypothesis and semi-supervised extreme learning machine | |
US8699796B1 (en) | Identifying sensitive expressions in images for languages with large alphabets | |
CN116051479A (en) | Textile defect identification method integrating cross-domain migration and anomaly detection | |
Chu et al. | Co-training based on semi-supervised ensemble classification approach for multi-label data stream | |
CN113609488A (en) | Vulnerability detection method and system based on self-supervised learning and multichannel hypergraph neural network | |
Jiang et al. | Dynamic proposal sampling for weakly supervised object detection | |
Cheng et al. | Tracing retinal blood vessels by matrix-forest theorem of directed graphs | |
Ghanmi et al. | Table detection in handwritten chemistry documents using conditional random fields | |
Moller et al. | Active learning for the classification of species in underwater images from a fixed observatory | |
Ullman et al. | Smart vulnerability assessment for scientific cyberinfrastructure: An unsupervised graph embedding approach | |
CN113343123A (en) | Training method and detection method for generating confrontation multiple relation graph network | |
CN117516937A (en) | Rolling bearing unknown fault detection method based on multi-mode feature fusion enhancement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200324 |
|
RJ01 | Rejection of invention patent application after publication |