WO2019169982A1 - Url异常定位方法、装置、服务器及存储介质 - Google Patents

Url异常定位方法、装置、服务器及存储介质 Download PDF

Info

Publication number
WO2019169982A1
WO2019169982A1 PCT/CN2019/073629 CN2019073629W WO2019169982A1 WO 2019169982 A1 WO2019169982 A1 WO 2019169982A1 CN 2019073629 W CN2019073629 W CN 2019073629W WO 2019169982 A1 WO2019169982 A1 WO 2019169982A1
Authority
WO
WIPO (PCT)
Prior art keywords
url
instance
abnormal
package
exception
Prior art date
Application number
PCT/CN2019/073629
Other languages
English (en)
French (fr)
Inventor
张雅淋
李龙飞
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Priority to SG11202005828UA priority Critical patent/SG11202005828UA/en
Priority to EP19763581.6A priority patent/EP3716571B1/en
Publication of WO2019169982A1 publication Critical patent/WO2019169982A1/zh
Priority to US16/878,521 priority patent/US10819745B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Definitions

  • the embodiments of the present disclosure relate to the field of Internet technologies, and in particular, to a URL abnormality positioning method, apparatus, server, and storage medium.
  • the embodiment of the present specification provides a URL abnormality positioning method, device, server, and storage medium.
  • an embodiment of the present disclosure provides a method for locating a URL abnormality, including: performing field segmentation on a URL to obtain a multi-instance package composed of respective examples corresponding to each field; and inputting the multiple sample package based on multi-instance learning
  • the URL anomaly positioning model performs anomaly example prediction; the corresponding exception field is located according to the exception example.
  • an embodiment of the present specification provides a URL abnormality positioning training method, including: collecting a URL sample set composed of a plurality of URL samples; performing field segmentation on each URL sample in the URL sample set, and obtaining a URL sample for each URL sample a multi-instance package corresponding to each example of each field; a multi-instance package that aggregates each URL sample obtains a multi-instance packet set; based on the multi-instance learning algorithm, an exception example and a non-abnormal example classification training are performed on the multi-instance packet set; Classification training, the URL anomaly positioning model is obtained.
  • the embodiment of the present disclosure provides a URL abnormality positioning apparatus, including: a segmentation unit, configured to perform field segmentation on a URL to obtain a multi-instance packet composed of respective examples corresponding to each field; and a prediction unit, configured to: The multi-instance package is input into the abnormality example prediction based on the multi-instance learning URL abnormality positioning model; and the positioning unit is configured to locate the corresponding abnormality field according to the abnormality example.
  • an embodiment of the present disclosure provides a URL abnormality positioning training apparatus, including: a sample obtaining unit, configured to collect a URL sample set composed of a plurality of URL samples; and a sample segmentation unit configured to each URL in the URL sample set The sample is subjected to field segmentation, and a multi-instance packet composed of respective examples corresponding to the respective fields is obtained for each URL sample; an example packet collection unit, a multi-instance package for collecting each URL sample to obtain a multi-instance packet set; a training unit, Based on the multi-instance learning algorithm, an exception example and a non-abnormal example classification training are performed on the multi-instance packet set, and the URL anomaly positioning model is obtained.
  • an embodiment of the present disclosure provides a server, including a memory, a processor, and a computer program stored on the memory and operable on the processor, where the processor executes the program to implement any of the foregoing The steps of the method.
  • an embodiment of the present specification provides a computer readable storage medium having stored thereon a computer program, the program being executed by a processor to implement the steps of any of the methods described above.
  • an exception instance is predicted by using a URL abnormal positioning model by using a URL anomaly positioning model to represent a package composed of a plurality of examples corresponding to a plurality of fields, thereby locating an abnormal field in the URL.
  • a URL abnormal positioning model by using a URL anomaly positioning model to represent a package composed of a plurality of examples corresponding to a plurality of fields, thereby locating an abnormal field in the URL.
  • the multi-instance learning URL anomaly positioning it can better predict the potential threats that are not found in the daily access data. Because the abnormality can be determined for the abnormal URL, it can discover potential threats, establish new security rules, and build security. The system provides tremendous support.
  • FIG. 1 is a schematic diagram of a URL abnormal positioning scenario according to an embodiment of the present disclosure
  • FIG. 2 is a flowchart of a URL abnormal positioning method according to a first aspect of the present disclosure
  • FIG. 3 is a flowchart of a URL abnormal positioning training method according to a second aspect of the present specification
  • FIG. 4 is a schematic structural diagram of a URL abnormality locating device according to a third aspect of the present disclosure.
  • FIG. 5 is a schematic structural diagram of a URL abnormality positioning training apparatus according to a fourth aspect of the present disclosure.
  • FIG. 6 is a schematic structural diagram of a URL abnormality positioning server according to a fifth aspect of the present disclosure.
  • the client submits a URL access request to the server (network side), the server parses the URL access request to obtain the URL, and performs abnormal positioning based on multi-instance learning on the URL. Specifically, the server trains the URL anomaly positioning model based on the multi-instance learning algorithm in advance, predicts the exception example by using the URL anomaly positioning model, and then determines the corresponding URL field by the exception example to achieve the purpose of locating the URL exception.
  • the so-called "URL abnormal positioning" in the embodiment of the present invention is different from merely giving a way of whether the entire URL is abnormal, but determining the specific abnormal field position in the URL, thereby facilitating more accurate analysis and prevention of the abnormality.
  • an embodiment of the present specification provides a URL abnormality positioning method.
  • the URL abnormality positioning method provided by the embodiment of the present specification includes the following steps S201-S203.
  • S201 Perform field segmentation on the URL to obtain a multi-instance package consisting of various examples corresponding to the respective fields.
  • the URL is segmented to obtain a multi-instance packet.
  • MIL Multiple Instance Learning
  • data is given in the form of a bag, and a bag usually has multiple instances. It can be understood that each URL in the embodiment of the present specification corresponds to one bag, and the package includes multiple examples, so a URL is represented by “multiple example package”.
  • the base URL contains the mode (or protocol), server name (or IP address), path, and file name, such as "Protocol://Authorization/Path? Query”.
  • the entire composition of the URL may be divided into fields, or only the high-risk field may be segmented.
  • server name field is further divided into multiple examples; or only the portion after the # (pound) is segmented to obtain multiple examples.
  • an example can be represented by a feature vector of a corresponding field.
  • a pattern, a character number, a letter number, and the like of a certain field are represented as a feature vector, and an example corresponding to the field is obtained.
  • S202 Input a multiple example package into a preset URL abnormal positioning model to perform abnormal example prediction.
  • a plurality of URL samples may be trained according to a multiple example learning algorithm in advance to obtain a URL abnormal positioning model (refer to FIG. 3 and related description for a training process); and then, multiple examples corresponding to the URL to be predicted are corresponding.
  • the packet input URL abnormality positioning model predicts the value of the exception flag of each example in the multiple example package according to the URL abnormal positioning model, thereby predicting whether each example in the multiple example package is an abnormal example.
  • a label is only given at the level of the packet.
  • the package is a positive package; and all examples in the negative package are negative examples.
  • the tag of the packet is known and the tag of the sample is unknown. Therefore, compared with traditional supervised learning, multi-example learning has less supervision information and is more difficult.
  • an exception example and a non-anomaly example are distinguished by giving an exemplary labeling manner.
  • the positive example refers to an abnormal example (for example, the value of the abnormal flag is 1 or has a positive flag), and the positive packet is an abnormal packet;
  • the negative example refers to a non-exception example (for example, an abnormality).
  • the value of the tag is 0 or has a negative flag), and the negative packet is also a non-exception packet.
  • the exception field corresponding to the exception example can be determined, that is, the abnormal position in the URL is determined.
  • the package for a package, as long as one of the examples is positive, the package is a positive package; and all examples in the negative package are negative. Therefore, if a multi-instance package corresponding to a certain URL includes an exception example, it is determined that the URL is an abnormal URL; if the multi-instance package corresponding to the URL does not include any exception example, it is determined that the URL is a non-exact URL. . Therefore, if it is determined that the URL corresponds to an exception example, it can be determined that the URL is an abnormal URL.
  • FIG. 3 it is a flowchart of a URL abnormal positioning training method according to a second aspect of the present disclosure, which includes:
  • S301 Collect a URL sample set consisting of multiple URL samples.
  • S302 Perform field segmentation on each URL sample in the URL sample set, and obtain a multi-instance package composed of respective examples corresponding to the respective fields for each URL sample.
  • a URL sample raw data it is segmented. For each segmented good field, different features can be extracted to represent the field as an example, and finally a different example of a URL is collected as a bag.
  • S303 Collecting multiple sample packages of respective URL samples to obtain a multi-instance package set.
  • S304 Perform an exception example and a non-abnormal example classification training on the multi-instance packet set based on the multi-instance learning algorithm.
  • the process of example classification training can be understood as a classifier that attempts to mine the examples from each package to train the classifiers that classify the examples.
  • the exception example and the non-exception example are distinguished by the value of the exception flag for the example.
  • the specific process of performing the exception example and the non-abnormal example classification training on the multiple example package includes: initializing the value of the exception tag of each example in the multiple sample package set, and performing the value of the exception tag Iterative learning updates the values of the final exception markers for each example.
  • the example tag is not visible, a possible tag is initialized first, and the sample tag is iteratively updated during the training process to achieve the purpose of optimizing the result. As the training progresses, the example markup will increasingly approach the fact, allowing the model to gradually discover possible exception instances (harmful examples) so that the final model can predict its possible exception fields for future URLs.
  • the positive example refers to an abnormal example (for example, the value of the abnormal flag is 1 or has a positive flag), and the positive packet is an abnormal packet;
  • the negative example refers to a non-exception example (for example, an abnormality).
  • the value of the tag is 0 or has a negative flag), and the negative packet is also a non-exception packet.
  • one or several instances in the positive selection bag are randomly selected to give a positive label, and the remaining examples in the positive bag are given a negative label.
  • all instances in it are given a negative label.
  • each instance gets an example label, and a classifier is trained according to these examples.
  • all the examples can be predicted to update the label of the instance.
  • the specific update process is, for a negative bag, where the label of the instance remains unchanged (both negative tags), for the instance in the positive bag,
  • modify the example tag according to the result of the classifier then perform a round of checking, that is, if all the instances of a positive bag are given a negative label, you need to select
  • the lowest score ie, the least confident of the negative example
  • the update of the example tag is completed, and the updated result is trained to train a new classifier.
  • the assignment of the initial instance label is relatively random, and during the training of the classifier, the instance label is gradually corrected, and the example label after the correction is used. Label) will make the trained classifier more accurate. Thereby achieving an optimized effect.
  • an example of an exception URL is:
  • each URL is represented as a feature vector of a certain dimension.
  • Such a method may find an abnormal URL, but cannot be accurately located in the above example. Exception field. The exact field that is found to be abnormal is expected to be seen. Therefore, the purpose of this method is to detect the abnormal field.
  • the problem is formalized into a multi-instance learning problem.
  • multi-instance learning data is given in the form of a bag, a package has a label, and there are multiple instances in the package, an instance. The label is unknown. If an instance in a package is a positive instance, then the package is a positive bag. If all instances are negative examples (negative) Instance), then this package is a negative bag.
  • a URL is a package, which is represented as a form containing three instances.
  • extracting patterns with different values: total number of characters, total number of letters, total number of digits, total number of symbols, number of different characters, number of different letters, different The number of digits and the number of different symbols.
  • the three vectors in the packet in the above example can be represented as [1,0,1,0,1,0,1,0], [1,0,1, respectively. 0,1,0,1,0],[15,11,3,1,13,9,3,1], the URL is a normal URL, so the tag of the package is negative (ie non-exact URL ).
  • an exception instance is predicted by using a URL abnormal positioning model by using a URL anomaly positioning model to represent a package composed of a plurality of examples corresponding to a plurality of fields, thereby locating an abnormal field in the URL.
  • a URL abnormal positioning model by using a URL anomaly positioning model to represent a package composed of a plurality of examples corresponding to a plurality of fields, thereby locating an abnormal field in the URL.
  • the multi-instance learning URL anomaly positioning it can better predict the potential threats that are not found in the daily access data. Because the abnormality can be determined for the abnormal URL, it can discover potential threats, establish new security rules, and build security. The system provides tremendous support.
  • the embodiment of the present specification provides a URL abnormality positioning device.
  • the method includes:
  • a segmentation unit 401 configured to perform field segmentation on the URL to obtain a multi-instance package composed of respective examples corresponding to the respective fields;
  • the predicting unit 402 is configured to input the multiple example package into a preset URL abnormal positioning model to perform abnormal example prediction;
  • the positioning unit 403 is configured to locate a corresponding abnormal field according to the abnormal example.
  • the method further includes:
  • the model training unit 404 is configured to train a plurality of URL samples based on the multi-instance learning algorithm to obtain the URL abnormality positioning model.
  • the model training unit 404 includes:
  • the sample is divided into a unit 4041, configured to perform field segmentation on each URL sample in the URL sample set, and obtain a multi-instance package composed of respective examples corresponding to the respective fields for each URL sample;
  • An example packet collection sub-unit 4042 a multi-instance package for collecting individual URL samples to obtain a multi-instance package set
  • the training sub-unit 4043 is configured to perform an abnormality example and a non-abnormal example classification training on the multi-instance packet based on the multi-instance learning algorithm to obtain the URL abnormal positioning model.
  • the training sub-unit 4043 is specifically configured to initialize the value of the exception flag of each example in the multiple example package set, and iteratively learn the value of the exception tag, and update and adjust the value of the final exception tag of each example.
  • the predicting unit 402 is specifically configured to: predict, according to the URL abnormal positioning model, a value of an abnormality flag of each example in the multiple example package, thereby determining whether each example is an abnormal example.
  • the method further includes:
  • the abnormal URL determining unit 405 is configured to determine whether the URL is an abnormal URL: if the example package corresponding to the URL includes an exception example, determining that the URL is an abnormal URL; if the example package corresponding to the URL does not include In the exception example, it is determined that the URL is a non-anomalous URL.
  • the example is represented by a feature vector of the corresponding field.
  • the field is a parameter request field in the URL.
  • the embodiment of the present specification provides a URL abnormality positioning training device. Referring to FIG. 5, the method includes:
  • a sample obtaining unit 501 configured to collect a set of URL samples composed of a plurality of URL samples
  • the sample segmentation unit 502 is configured to perform field segmentation on each URL sample in the URL sample set, and obtain a multi-instance package composed of respective examples corresponding to the respective fields for each URL sample;
  • An example package collection unit 503, a multiple example package for collecting individual URL samples to obtain a multi-instance package set
  • the training unit 504 is configured to perform an abnormality example and a non-abnormal example classification training on the multi-instance packet based on the multi-instance learning algorithm to obtain the URL abnormal positioning model.
  • the training unit 504 is specifically configured to: initialize the value of the exception flag of each example in the multiple example package set, and iteratively learn the value of the abnormal tag, and update and adjust the value of the final exception tag of each example.
  • the present invention further provides a server, as shown in FIG. 6, including a memory 604, a processor 602, and a memory 604. And a computer program running on the processor 602, the processor 602 executing the program to implement the step of the URL abnormal positioning method described above.
  • bus 600 can include any number of interconnected buses and bridges, and bus 600 will include one or more processors and memory 604 represented by processor 602.
  • the various circuits of the memory are linked together.
  • the bus 600 can also link various other circuits, such as peripherals, voltage regulators, and power management circuits, as is known in the art, and therefore, will not be further described herein.
  • Bus interface 606 provides an interface between bus 600 and receiver 601 and transmitter 603.
  • Receiver 601 and transmitter 603 may be the same component, i.e., a transceiver, providing means for communicating with various other devices on a transmission medium.
  • Processor 602 is responsible for managing bus 600 and normal processing, while memory 604 can be used to store data used by processor 602 in performing operations.
  • the present invention further provides a computer readable storage medium having a computer program stored thereon, and the program implements the URL exception described above when executed by the processor, based on the inventive concept of the URL anomaly positioning method in the foregoing embodiment.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • a device implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of a flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本说明书实施例提供了一种URL异常定位方法,通过对URL表示为多个示例组成的包,并利用URL异常定位模型对异常示例进行预测,从而定位出URL中的异常字段。基于多示例学习的URL异常定位,能够较好的预测出数据中未发现的潜在威胁。

Description

URL异常定位方法、装置、服务器及存储介质
相关申请的交叉引用
本专利申请要求于2018年03月06日提交的、申请号为201810182571.X、发明名称为“URL异常定位方法、装置、服务器及存储介质”的中国专利申请的优先权,该申请的全文以引用的方式并入本文中。
技术领域
本说明书实施例涉及互联网技术领域,尤其涉及一种URL异常定位方法、装置、服务器及存储介质。
背景技术
在互联网的应用场景中,每天会有大量的对于网址URL(Uniform Resource Locator,统一资源定位符)的访问;与此同时,不乏不法分子试图通过不合法的URL访问进行攻击。
发明内容
本说明书实施例提供及一种URL异常定位方法、装置、服务器及存储介质。
第一方面,本说明书实施例提供一种URL异常定位方法,包括:对URL进行字段切分,得到由对应各个字段的各个示例组成的多示例包;将所述多示例包输入基于多示例学习的URL异常定位模型进行异常示例预测;根据异常示例定位出对应的异常字段。
第二方面,本说明书实施例提供一种URL异常定位训练方法,包括:收集由多个URL样本组成的URL样本集;对URL样本集中各个URL样本进行字段切分,针对每个URL样本得到由对应各个字段的各个示例组成的多示例包;集合各个URL样本的多示例包得到多示例包集;基于多示例学习算法,对多示例包集进行异常示例及非异常示例分类训练;基于所述分类训练,得到所述URL异常定位模型。
第三方面,本说明书实施例提供一种URL异常定位装置,包括:切分单元,用于对URL进行字段切分,得到由对应各个字段的各个示例组成的多示例包;预测单元,用于将所述多示例包输入基于多示例学习的URL异常定位模型进行异常示例预测;定位单 元,用于根据异常示例定位出对应的异常字段。
第四方面,本说明书实施例提供一种URL异常定位训练装置,包括:样本获取单元,用于收集由多个URL样本组成的URL样本集;样本切分单元,用于对URL样本集中各个URL样本进行字段切分,针对每个URL样本得到由对应各个字段的各个示例组成的多示例包;示例包集合单元,用于集合各个URL样本的多示例包得到多示例包集;训练单元,用于基于多示例学习算法,对多示例包集进行异常示例及非异常示例分类训练,得到所述URL异常定位模型。
第五方面,本说明书实施例提供一种服务器,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述任一项所述方法的步骤。
第六方面,本说明书实施例提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述任一项所述方法的步骤。
本说明书实施例有益效果如下:
在本说明书实施例中,通过对URL表示为对应多个字段的多个示例组成的包,利用URL异常定位模型对异常示例进行预测,从而定位出URL中的异常字段。基于多示例学习的URL异常定位,能够较好的预测出每天的访问数据中未发现的潜在威胁,由于可以针对异常URL确定异常所在,从而可为发现潜在威胁、建立新的安全规则、构建安全系统提供巨大的支持。
附图说明
图1为本说明书实施例的URL异常定位场景示意图;
图2为本说明书实施例第一方面的URL异常定位方法流程图;
图3为本说明书实施例第二方面的URL异常定位训练方法流程图;
图4为本说明书实施例第三方面的URL异常定位装置结构示意图;
图5为本说明书实施例第四方面的URL异常定位训练装置结构示意图;
图6为本说明书实施例第五方面提供的URL异常定位服务器结构示意图。
具体实施方式
为了更好的理解上述技术方案,下面通过附图以及具体实施例对本说明书实施例的技术方案做详细的说明,应当理解本说明书实施例以及实施例中的具体特征是对本说明书实施例技术方案的详细的说明,而不是对本说明书技术方案的限定,在不冲突的情况下,本说明书实施例以及实施例中的技术特征可以相互组合。
本说明书实施例URL异常定位场景示意图请参见图1。客户端(用户侧)向服务端(网络侧)提出URL访问请求,服务端解析URL访问请求得到URL,并对URL进行基于多示例学习的异常定位。具体的,服务端预先基于多示例学习算法训练出URL异常定位模型,通过URL异常定位模型预测出异常示例,再由异常示例确定出对应的URL所在字段,达到定位URL异常的目的。本发明实施例中所谓“URL异常定位”,区别于仅仅给出整个URL是否异常的方式,而是确定出URL中具体的异常字段位置,从而可便于对异常进行更为准确的分析和预防。
第一方面,本说明书实施例提供一种URL异常定位方法。
请参考图2,本说明书实施例提供的URL异常定位方法包括如下步骤S201-S203。
S201:对URL进行字段切分,得到由对应各个字段的各个示例组成的多示例包。
基于多示例学习(Multiple Instance Learning,MIL),对URL进行切分得到多示例构成的包。区别于传统的监督学习,在多示例学习中,数据是以包(bag)的形式给出,一个包(bag)中通常具有多个示例(instance)。可以理解,本说明书实施例中每一个URL对应一个包(bag)、而包中又包括多个示例,因此以“多示例包”表示一个URL。
基本URL包含模式(或称协议)、服务器名称(或IP地址)、路径和文件名,如“协议://授权/路径?查询”。完整的、带有授权部分的普通统一资源标志符语法可如下:协议://用户名:密码@子域名.域名.顶级域名:端口号/目录/文件名.文件后缀?参数=值#标志。本说明书实施例中可以对URL整个构成进行字段切分,也可以仅对高风险字段进行切分。
例如,仅对服务器名称字段进行进一步切分为多个示例;或者仅对#(井号)后面的部分进行字段切分得到多个示例。
其中,示例可由对应字段的特征向量表示。例如,将某字段的pattern、字符数、字母数等表示为特征向量,得到该字段对应的示例。
S202:将多示例包输入预设的URL异常定位模型进行异常示例预测。
本说明书实施例中,首先,可预先根据多示例学习算法基于多个URL样本进行训练,得到URL异常定位模型(训练过程请参考图3及相关描述);然后,将待预测URL对应的多示例包输入URL异常定位模型,根据URL异常定位模型,预测多示例包中的各个示例的异常标记的值,从而预测多示例包中的各个示例是否为异常示例。
在传统多示例学习算法中,仅在包的层面给出标记(label)。例如,对于标准的多示例学习而言,以二分类为例,对于一个包,只要其中有一个示例是正示例,该包就是正包;而负包中的所有示例都是负示例。要说明的是,包的标记是已知的,样本的标记是未知的。因此,相比传统监督学习,多示例学习的监督信息更少,难度更大。
本说明书实施例中,通过给出示例的标记的方式,区别出异常示例和非异常示例(而不仅限于包层面的标记)。
对于一个正包(positive bag),其中至少有一个正示例(positive instance),而负包(negative bag)中的所有示例都是负示例(negative instance)。需要说明的是,本说明书实施例中,所谓正示例是指异常示例(例如异常标记的值为1或者具有正标记),正包也即异常包;所谓负示例是指非异常示例(例如异常标记的值为0或者具有负标记),负包也即非异常包。
S203:根据异常示例定位出对应的异常字段。
由于每一个示例都是确定对应某个字段的,因此在预测出异常示例之后,即可确定出异常示例对应的异常字段,也即确定出URL中异常位置。
本发明实施例中:对于一个包,只要其中有一个示例是正的,该包就是正包;而负包中的所有示例都是负的。因此,如果某个URL对应的多示例包中,只要包括一个异常示例,则确定该URL为异常URL;如果该URL对应的多示例包中不包括任何异常示例,则确定该URL为非异常URL。因此,如果确定出URL对应有异常示例,则可确定URL为异常URL。
参考图3,为本说明书实施例第二方面的URL异常定位训练方法流程图,包括:
S301:收集由多个URL样本组成的URL样本集。
S302:对URL样本集中各个URL样本进行字段切分,针对每个URL样本得到由对应各个字段的各个示例组成的多示例包。
对于一条URL样本原始数据,将其切分,对于每一个切分好的字段,可以提取不同 的特征来对字段以示例进行表示,最终集合一个URL的不同示例作为一个包(bag)。
S303:集合各个URL样本的多示例包得到多示例包集。
S304:基于多示例学习算法,对多示例包集进行异常示例及非异常示例分类训练。
对示例分类训练的过程可以理解为,是从每个包里面试图挖掘其示例的标记,从而来训练对于示例进行分类的分类器。其中:异常示例和非异常示例通过示例的异常标记的值进行区分。
在一种可选方式中,对多示例包集进行异常示例及非异常示例分类训练的具体过程包括:对多示例包集中每个示例的异常标记的值进行初始化,并对异常标记的值进行迭代学习,更新调整出每个示例最终的异常标记的值。
由于示例的标记不可见,因此先初始化一个可能的标记,进而在训练的过程中迭代的更新示例标记,以达到优化结果的目的。随着训练的进行,示例标记会越来越趋于接近事实,从而使模型能够渐渐发现可能的异常示例(有害示例),使得最终的模型能对未来的URL预测其可能的异常字段。
具体而言,如前所述的,对于一个正包(positive bag),其中至少有一个正示例(positive instance),而负包(negative bag)中的所有示例都是负示例(negative instance)。需要说明的是,本说明书实施例中,所谓正示例是指异常示例(例如异常标记的值为1或者具有正标记),正包也即异常包;所谓负示例是指非异常示例(例如异常标记的值为0或者具有负标记),负包也即非异常包。
因而,在初始化时,随机选择正包(positive bag)中的一个或几个示例(instance)赋予正标记(positive label),正包(positive bag)中的其余示例赋予负标记(negative label),对于负包(negative bag),其中的所有示例(instance)均赋予负标记(negative label)。
基于以上的初始化,每个示例(instance)都会获得一个示例标记(label),依据这些示例训练一个分类器(Classifier)。当得到分类器(Classifier)之后,又可以对所有的示例(instance)来进行预测,从而更新示例(instance)的标记(label)。例如,具体的更新过程为,对于负包(negative bag),其中示例(instance)的标记(label)保持不变(均为负标记),对于正包(positive bag)中的示例(instance),首先根据分类器(Classifier)的结果来修改示例的标记;之后进行一轮检查,即如果某一个正包(positive bag)的所有示例(instance)都被赋予了负标记(negative label),需要选择其中得分 最低的(即判断为负示例的置信度最低的),赋予其正标记(positive label)。如此,完成对示例标记的更新,再给予更新的结果训练新的分类器(Classifier)。
循环以上过程,直至前后两轮之间示例的标记不发生改变为止。
需要注意的是,一开始的示例标记(instance label)的赋予是比较随机的,而在分类器训练的过程中,会渐渐的对示例标记(instance label)进行修正,修正之后的示例标记(instance label)又会使得训练的分类器更加准确。从而达到优化的效果。
S305:基于分类训练,得到URL异常定位模型。
下面针对一个具体url进行异常定位的过程,对本说明书实施例做示例性说明。
对于一条正常的URL,例如:
http://render.alipay.com/p/s/alipay_site/wait?mintime=3&maxtime=5&fromspanner=goldetfprod_502
对于攻击者而言,往往通过修改其中的任何可能的部分,来达到攻击的目的,在实际业务中,由于域名字段(例如上述:http://render.alipay.com)基本都是固定的,只有参数请求部分会存在被修改的可能性,因而攻击往往由此产生。如下例中,攻击者通过手动的修改一个正常的URL,来达到执行script的目的。
例如,异常URL示例为:
http://render.alipay.com/p/s/alipay_site/wait?mintime=3>SCRiPT={Sleep}&maxtime=5&fromspanner=goldetfprod_502
这里的异常字段就是“mintime=3>SCRiPT={Sleep}”。
传统的技术手段,都是试图将URL进行统一的表示,即将每个URL表示成一个一定维度的特征向量的形式,这样的方法,可能发现异常的URL,但没法准确定位到上面的示例中的异常字段。而准确的发现异常存在的字段,是希望能够看到的,因而,这一方法的目的就在于对异常字段进行检测。
具体而言,将这一问题形式化成一个多示例学习的问题,在获取数据时,往往可得知哪些URL是正常的,哪些是异常的,这也就意味着,对于一个URL的标记,往往是在URL整个层面的,对于每个字段是否有问题,往往是不可见的。这刚好符合多示例学习的假设:在多示例学习中,数据是以包(bag)的形式给出,一个包具有一个标记(label),包中具有多个示例(instance),示例(instance)的标记(label)是不知道 的,如果一个包中有一个示例(instance)是正示例(positive instance),那么这个包就是正包(positive bag),若所有示例(instance)都是负示例(negative instance),那么这个包就是负包(negative bag)。
还是以上述正常URL举例来说明:通过对URL的切分,得到不同的子字段,即“mintime=3”;“maxtime=5”;“fromspanner=goldetfprod_502”这三部分(instance)(其中这三部分的value分别为“3”、“5”、“goldetfprod_502”);这里一个URL就是一个包,它被表示成包含三个instance的形式。进一步的,对于每一字段的信息提取和表示,可以有各种不同的模式,例如,提取value不同的pattern:字符总数、字母总数、数字总数、符号总数、不同字符数、不同字母数、不同数字数、不同符号数,由此,以上示例中的包中的三个向量就可以分别表示为[1,0,1,0,1,0,1,0],[1,0,1,0,1,0,1,0],[15,11,3,1,13,9,3,1],该URL是一个正常的URL,因而该包的标记是负的(即非异常URL)。
同理,对于上述被修改的异常URL,可通过模型识别出具体哪个示例异常,由该异常示例确定出对应的异常字段(异常子字段),即“SCRiPT={Sleep}”。由此便于异常分析和异常预防。
在本说明书实施例中,通过对URL表示为对应多个字段的多个示例组成的包,利用URL异常定位模型对异常示例进行预测,从而定位出URL中的异常字段。基于多示例学习的URL异常定位,能够较好的预测出每天的访问数据中未发现的潜在威胁,由于可以针对异常URL确定异常所在,从而可为发现潜在威胁、建立新的安全规则、构建安全系统提供巨大的支持。
第三方面,基于同一发明构思,本说明书实施例提供一种URL异常定位装置,请参考图4,包括:
切分单元401,用于对所述URL进行字段切分,得到由对应各个字段的各个示例组成的多示例包;
预测单元402,用于将所述多示例包输入预设的URL异常定位模型进行异常示例预测;
定位单元403,用于根据异常示例定位出对应的异常字段。
在一种可选方式中,还包括:
模型训练单元404,用于基于多示例学习算法对多个URL样本进行训练,得到所述 URL异常定位模型。
在一种可选方式中,所述模型训练单元404包括:
样本切分在单元4041,用于对URL样本集中各个URL样本进行字段切分,针对每个URL样本得到由对应各个字段的各个示例组成的多示例包;
示例包集合子单元4042,用于集合各个URL样本的多示例包得到多示例包集;
训练子单元4043,用于基于多示例学习算法,对多示例包集进行异常示例及非异常示例分类训练,得到所述URL异常定位模型。
在一种可选方式中,所述异常示例和非异常示例通过示例的异常标记的值进行区分;
所述训练子单元4043具体用于:对多示例包集中每个示例的异常标记的值进行初始化,并对异常标记的值进行迭代学习,更新调整出每个示例最终的异常标记的值。
在一种可选方式中,所述预测单元402具体用于:根据所述URL异常定位模型,预测所述多示例包中的各个示例的异常标记的值,从而确定各个示例是否为异常示例。
在一种可选方式中,还包括:
异常URL确定单元405,用于确定所述URL是否为异常URL:如果所述URL对应的示例包中包括异常示例,则确定所述URL为异常URL;如果所述URL对应的示例包中不包括异常示例,则确定所述URL为非异常URL。
在一种可选方式中,所述示例是由对应字段的特征向量表示的。
在一种可选方式中,所述字段为URL中参数请求字段。
第四方面,基于同一发明构思,本说明书实施例提供一种URL异常定位训练装置,请参考图5,包括:
样本获取单元501,用于收集由多个URL样本组成的URL样本集;
样本切分单元502,用于对URL样本集中各个URL样本进行字段切分,针对每个URL样本得到由对应各个字段的各个示例组成的多示例包;
示例包集合单元503,用于集合各个URL样本的多示例包得到多示例包集;
训练单元504,用于基于多示例学习算法,对多示例包集进行异常示例及非异常示例分类训练,得到所述URL异常定位模型。
在一种可选方式中,所述异常示例和非异常示例通过示例的异常标记的值进行区分;
所述训练单元504具体用于:对多示例包集中每个示例的异常标记的值进行初始化,并对异常标记的值进行迭代学习,更新调整出每个示例最终的异常标记的值。
第四方面,基于与前述实施例中基于流式计算的数据处理控制方法同样的发明构思,本发明还提供一种服务器,如图6所示,包括存储器604、处理器602及存储在存储器604上并可在处理器602上运行的计算机程序,所述处理器602执行所述程序时实现前文所述URL异常定位方法的步骤。
其中,在图6中,总线架构(用总线600来代表),总线600可以包括任意数量的互联的总线和桥,总线600将包括由处理器602代表的一个或多个处理器和存储器604代表的存储器的各种电路链接在一起。总线600还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路链接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口606在总线600和接收器601和发送器603之间提供接口。接收器601和发送器603可以是同一个元件,即收发机,提供用于在传输介质上与各种其他装置通信的单元。处理器602负责管理总线600和通常的处理,而存储器604可以被用于存储处理器602在执行操作时所使用的数据。
第六方面,基于与前述实施例中URL异常定位方法的发明构思,本发明还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现前文所述URL异常定位的任一方法的步骤。
本说明书是参照根据本说明书实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的设备。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令设备的制造品,该指令设备实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算 机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本说明书的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本说明书范围的所有变更和修改。
显然,本领域的技术人员可以对本说明书进行各种改动和变型而不脱离本说明书的精神和范围。这样,倘若本说明书的这些修改和变型属于本说明书权利要求及其等同技术的范围之内,则本说明书也意图包含这些改动和变型在内。

Claims (22)

  1. 一种URL异常定位方法,包括:
    对URL进行字段切分,得到由对应各个字段的各个示例组成的多示例包;
    将所述多示例包输入基于多示例学习的URL异常定位模型进行异常示例预测;
    根据异常示例定位出对应的异常字段。
  2. 根据权利要求1所述的方法,还包括:
    基于多示例学习算法对多个URL样本进行训练,得到所述URL异常定位模型。
  3. 根据权利要求2所述的方法,所述基于多示例学习算法对多个URL样本进行训练,得到所述URL异常定位模型,包括:
    对URL样本集中各个URL样本进行字段切分,针对每个URL样本得到由对应各个字段的各个示例组成的多示例包;
    集合各个URL样本的多示例包得到多示例包集;
    基于多示例学习算法,对多示例包集进行异常示例及非异常示例分类训练;
    基于所述分类训练,得到所述URL异常定位模型。
  4. 根据权利要求3所述的方法,所述异常示例和非异常示例通过示例的异常标记的值进行区分;
    所述对多示例包集进行异常示例及非异常示例分类训练包括:
    对多示例包集中每个示例的异常标记的值进行初始化,并对异常标记的值进行迭代学习,更新调整出每个示例最终的异常标记的值。
  5. 根据权利要求4所述的方法,所述将所述多示例包输入预设的URL异常定位模型进行异常示例预测包括:
    根据所述URL异常定位模型,预测所述多示例包中的各个示例的异常标记的值,从而预测所述多示例包中的各个示例是否为异常示例。
  6. 根据权利要求1-5任一项所述的方法,还包括:
    确定所述URL是否为异常URL:
    如果所述URL对应的示例包中包括异常示例,则确定所述URL为异常URL;
    如果所述URL对应的示例包中不包括异常示例,则确定所述URL为非异常URL。
  7. 根据权利要求1-5任一项所述的方法,所述示例是由对应字段的特征向量表示的。
  8. 根据权利要求1-5任一项所述的方法,所述字段为URL中参数请求字段。
  9. 一种URL异常定位训练方法,包括:
    收集由多个URL样本组成的URL样本集;
    对URL样本集中各个URL样本进行字段切分,针对每个URL样本得到由对应各个字段的各个示例组成的多示例包;
    集合各个URL样本的多示例包得到多示例包集;
    基于多示例学习算法,对多示例包集进行异常示例及非异常示例分类训练;
    基于所述分类训练,得到所述URL异常定位模型。
  10. 根据权利要求9所述的方法,所述异常示例和非异常示例通过示例的异常标记的值进行区分;
    所述对多示例包集进行异常示例及非异常示例分类训练包括:
    对多示例包集中每个示例的异常标记的值进行初始化,并对异常标记的值进行迭代学习,更新调整出每个示例最终的异常标记的值。
  11. 一种URL异常定位装置,包括:
    切分单元,用于对URL进行字段切分,得到由对应各个字段的各个示例组成的多示例包;
    预测单元,用于将所述多示例包输入基于多示例学习的的URL异常定位模型进行异常示例预测;
    定位单元,用于根据异常示例定位出对应的异常字段。
  12. 根据权利要求11所述的装置,还包括:
    模型训练单元,用于基于多示例学习算法对多个URL样本进行训练,得到所述URL异常定位模型。
  13. 根据权利要求12所述的装置,所述模型训练单元,包括:
    样本切分在单元,用于对URL样本集中各个URL样本进行字段切分,针对每个URL样本得到由对应各个字段的各个示例组成的多示例包;
    示例包集合子单元,用于集合各个URL样本的多示例包得到多示例包集;
    训练子单元,用于基于多示例学习算法,对多示例包集进行异常示例及非异常示例分类训练,得到所述URL异常定位模型。
  14. 根据权利要求13所述的装置,所述异常示例和非异常示例通过示例的异常标记的值进行区分;
    所述训练子单元具体用于:对多示例包集中每个示例的异常标记的值进行初始化,并对异常标记的值进行迭代学习,更新调整出每个示例最终的异常标记的值。
  15. 根据权利要求14所述的装置,所述预测单元具体用于:根据所述URL异常定 位模型,预测所述多示例包中的各个示例的异常标记的值,从而确定各个示例是否为异常示例。
  16. 根据权利要求11-15任一项所述的装置,还包括:
    异常URL确定单元,用于确定所述URL是否为异常URL:如果所述URL对应的示例包中包括异常示例,则确定所述URL为异常URL;如果所述URL对应的示例包中不包括异常示例,则确定所述URL为非异常URL。
  17. 根据权利要求11-15任一项所述的装置,所述示例是由对应字段的特征向量表示的。
  18. 根据权利要求11-15任一项所述的装置,所述字段为URL中参数请求字段。
  19. 一种URL异常定位训练装置,包括:
    样本获取单元,用于收集由多个URL样本组成的URL样本集;
    样本切分单元,用于对URL样本集中各个URL样本进行字段切分,针对每个URL样本得到由对应各个字段的各个示例组成的多示例包;
    示例包集合单元,用于集合各个URL样本的多示例包得到多示例包集;
    训练单元,用于基于多示例学习算法,对多示例包集进行异常示例及非异常示例分类训练,得到所述URL异常定位模型。
  20. 根据权利要求19所述的装置,所述异常示例和非异常示例通过示例的异常标记的值进行区分;
    所述训练单元具体用于:对多示例包集中每个示例的异常标记的值进行初始化,并对异常标记的值进行迭代学习,更新调整出每个示例最终的异常标记的值。
  21. 一种服务器,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1-10任一项所述方法的步骤。
  22. 一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现权利要求1-10任一项所述方法的步骤。
PCT/CN2019/073629 2018-03-06 2019-01-29 Url异常定位方法、装置、服务器及存储介质 WO2019169982A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
SG11202005828UA SG11202005828UA (en) 2018-03-06 2019-01-29 Url abnormality positioning method and device, and server and storage medium
EP19763581.6A EP3716571B1 (en) 2018-03-06 2019-01-29 Url abnormality positioning method and device, and server and storage medium
US16/878,521 US10819745B2 (en) 2018-03-06 2020-05-19 URL abnormality positioning method and device, and server and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810182571.XA CN108366071B (zh) 2018-03-06 2018-03-06 Url异常定位方法、装置、服务器及存储介质
CN201810182571.X 2018-03-06

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/878,521 Continuation US10819745B2 (en) 2018-03-06 2020-05-19 URL abnormality positioning method and device, and server and storage medium

Publications (1)

Publication Number Publication Date
WO2019169982A1 true WO2019169982A1 (zh) 2019-09-12

Family

ID=63003692

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/073629 WO2019169982A1 (zh) 2018-03-06 2019-01-29 Url异常定位方法、装置、服务器及存储介质

Country Status (6)

Country Link
US (1) US10819745B2 (zh)
EP (1) EP3716571B1 (zh)
CN (1) CN108366071B (zh)
SG (1) SG11202005828UA (zh)
TW (1) TWI703846B (zh)
WO (1) WO2019169982A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108366071B (zh) 2018-03-06 2020-06-23 阿里巴巴集团控股有限公司 Url异常定位方法、装置、服务器及存储介质
US11762990B2 (en) * 2020-04-07 2023-09-19 Microsoft Technology Licensing, Llc Unstructured text classification
US12003535B2 (en) 2021-03-01 2024-06-04 Microsoft Technology Licensing, Llc Phishing URL detection using transformers

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106055574A (zh) * 2016-05-19 2016-10-26 微梦创科网络科技(中国)有限公司 一种识别非法统一资源标识符url的方法与装置
CN106131071A (zh) * 2016-08-26 2016-11-16 北京奇虎科技有限公司 一种Web异常检测方法和装置
CN107577945A (zh) * 2017-09-28 2018-01-12 阿里巴巴集团控股有限公司 Url攻击检测方法、装置以及电子设备
CN107992741A (zh) * 2017-10-24 2018-05-04 阿里巴巴集团控股有限公司 一种模型训练方法、检测url的方法及装置
CN108111489A (zh) * 2017-12-07 2018-06-01 阿里巴巴集团控股有限公司 Url攻击检测方法、装置以及电子设备
CN108229156A (zh) * 2017-12-28 2018-06-29 阿里巴巴集团控股有限公司 Url攻击检测方法、装置以及电子设备
CN108366071A (zh) * 2018-03-06 2018-08-03 阿里巴巴集团控股有限公司 Url异常定位方法、装置、服务器及存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8019700B2 (en) * 2007-10-05 2011-09-13 Google Inc. Detecting an intrusive landing page
US8261342B2 (en) * 2008-08-20 2012-09-04 Reliant Security Payment card industry (PCI) compliant architecture and associated methodology of managing a service infrastructure
CN101702660B (zh) * 2009-11-12 2011-12-14 中国科学院计算技术研究所 异常域名检测方法及系统
KR20140061654A (ko) * 2012-11-14 2014-05-22 한국인터넷진흥원 고위험 악성코드 식별 시스템
US9904893B2 (en) * 2013-04-02 2018-02-27 Patternex, Inc. Method and system for training a big data machine to defend
US10079876B1 (en) * 2014-09-30 2018-09-18 Palo Alto Networks, Inc. Mobile URL categorization
US10178107B2 (en) * 2016-04-06 2019-01-08 Cisco Technology, Inc. Detection of malicious domains using recurring patterns in domain names
AU2017281232B2 (en) * 2016-06-22 2020-02-13 Invincea, Inc. Methods and apparatus for detecting whether a string of characters represents malicious activity using machine learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106055574A (zh) * 2016-05-19 2016-10-26 微梦创科网络科技(中国)有限公司 一种识别非法统一资源标识符url的方法与装置
CN106131071A (zh) * 2016-08-26 2016-11-16 北京奇虎科技有限公司 一种Web异常检测方法和装置
CN107577945A (zh) * 2017-09-28 2018-01-12 阿里巴巴集团控股有限公司 Url攻击检测方法、装置以及电子设备
CN107992741A (zh) * 2017-10-24 2018-05-04 阿里巴巴集团控股有限公司 一种模型训练方法、检测url的方法及装置
CN108111489A (zh) * 2017-12-07 2018-06-01 阿里巴巴集团控股有限公司 Url攻击检测方法、装置以及电子设备
CN108229156A (zh) * 2017-12-28 2018-06-29 阿里巴巴集团控股有限公司 Url攻击检测方法、装置以及电子设备
CN108366071A (zh) * 2018-03-06 2018-08-03 阿里巴巴集团控股有限公司 Url异常定位方法、装置、服务器及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3716571A4 *

Also Published As

Publication number Publication date
CN108366071B (zh) 2020-06-23
US10819745B2 (en) 2020-10-27
TWI703846B (zh) 2020-09-01
TW201939932A (zh) 2019-10-01
CN108366071A (zh) 2018-08-03
EP3716571B1 (en) 2023-08-09
SG11202005828UA (en) 2020-07-29
US20200280583A1 (en) 2020-09-03
EP3716571A4 (en) 2021-01-20
EP3716571A1 (en) 2020-09-30

Similar Documents

Publication Publication Date Title
TWI706273B (zh) 統一資源定位符(url)攻擊檢測方法、裝置及電子設備
US10050986B2 (en) Systems and methods for traffic classification
US20210294820A1 (en) Device discovery system
CN110084039B (zh) 用于端点安全与网络安全服务之间的协调的框架
CN103748853B (zh) 用于对数据通信网络中的协议消息进行分类的方法和系统
US11805136B2 (en) Scanning container images and objects associated with building the container images
CN111953641A (zh) 未知网络流量的分类
WO2019169982A1 (zh) Url异常定位方法、装置、服务器及存储介质
US20110154489A1 (en) System for analyzing malicious botnet activity in real time
US11019096B2 (en) Combining apparatus, combining method, and combining program
CN110012037B (zh) 基于不确定性感知攻击图的网络攻击预测模型构建方法
CN107563201A (zh) 基于机器学习的关联样本查找方法、装置及服务器
CN115039379A (zh) 使用分类器层级确定设备属性的系统和方法
CN108933773A (zh) 在完成文件的接收之前使用元数据标识文件并确定文件的安全性分类
US20220019676A1 (en) Threat analysis and risk assessment for cyber-physical systems based on physical architecture and asset-centric threat modeling
WO2018057691A1 (en) Unsupervised classification of web traffic users
WO2017052942A1 (en) Multi-label classification for overlapping classes
CN110135153A (zh) 软件的可信检测方法及装置
CN104486312B (zh) 一种应用程序的识别方法和装置
CN113010268B (zh) 恶意程序识别方法及装置、存储介质、电子设备
CN110581857B (zh) 一种虚拟执行的恶意软件检测方法及系统
EP3848822B1 (en) Data classification device, data classification method, and data classification program
CN108154033A (zh) 一种管理漏洞信息的方法、装置、电子设备及存储介质
WO2017095391A1 (en) Label management
US20230008765A1 (en) Estimation apparatus, estimation method and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19763581

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019763581

Country of ref document: EP

Effective date: 20200623

NENP Non-entry into the national phase

Ref country code: DE