CN106940711B - URL detection method and detection device - Google Patents

URL detection method and detection device Download PDF

Info

Publication number
CN106940711B
CN106940711B CN201710108755.7A CN201710108755A CN106940711B CN 106940711 B CN106940711 B CN 106940711B CN 201710108755 A CN201710108755 A CN 201710108755A CN 106940711 B CN106940711 B CN 106940711B
Authority
CN
China
Prior art keywords
url
detected
family
sub
mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710108755.7A
Other languages
Chinese (zh)
Other versions
CN106940711A (en
Inventor
张龙
李志强
王晓琪
刘敏
高学龄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nsfocus Technologies Inc
Nsfocus Technologies Group Co Ltd
Original Assignee
NSFOCUS Information Technology Co Ltd
Beijing NSFocus Information Security Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NSFOCUS Information Technology Co Ltd, Beijing NSFocus Information Security Technology Co Ltd filed Critical NSFOCUS Information Technology Co Ltd
Priority to CN201710108755.7A priority Critical patent/CN106940711B/en
Publication of CN106940711A publication Critical patent/CN106940711A/en
Application granted granted Critical
Publication of CN106940711B publication Critical patent/CN106940711B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links

Abstract

The embodiment of the invention discloses a URL detection method and a detection device, which are used for solving the problem that the prior art can not accurately extract parameters, and comprise the following steps: acquiring path information of a URL to be detected; abstracting the path information of the URL to be detected to obtain a native mode of the URL to be detected; determining a URL family to which the URL to be detected belongs according to the native mode of the URL to be detected; the URL family has the same native pattern as the URL to be detected; merging the URL to be detected into a URL family; selecting a representative mode of the URL to be detected from the URL family; and extracting parameters in the URL to be detected according to the representative mode for detection, and taking the detection result of the URL to be detected as the detection result of the representative mode. The URL to be detected is extracted only by parameters according to the abstract part of the URL to be detected, so that the accuracy of parameter extraction is improved, and the accuracy of URL detection is improved.

Description

URL detection method and detection device
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a URL detection method and a detection apparatus.
Background
In the risk assessment of the network application scanning, each possible point needs to be detected and assessed. The Uniform Resource Locator (URL) is a point where there may be a vulnerability, however, the huge amount of URLs makes the search process tedious and even impossible. Generally, the URLs of the same category have the same vulnerability, so that accurately de-duplicating the URLs of the same category can help us to more efficiently evaluate the vulnerability existing in the site, and repeated detection is less. In practical application, parameters and parameter values are extracted to construct payload data (payload) for detection, and meanwhile, page, directory or parameter names, even parameter values, are deduplicated according to detection rules.
The prior art can only accommodate conventional standard forms of links, identified by special characters in the URL, such as http:// www.test.com/admin/easy check/exercerd/? batch _ id 28 and http:// www.test.com/admin/easy check/exercerd/? batch _ id 29 such a link, according to "? "identify the query string (query string), and distinguish different parameters according to the query string, so as to achieve the purpose of parameter extraction and deduplication, that is, in the example, the parameter of the two URLs is batch _ id, the parameter values are 28 and 29, respectively, and for the detection logic such as Cross Site Script (XSS), only one of the two URLs needs to be detected.
However, there are many web sites on the internet that use URL Rewriting (URL Rewriting) technology. For example, http:// www.somebloghost.com/Blogs/posts. php? After URL rewriting, Year ═ 2006& Month ═ 12& Day ═ 10 can become: http:// www.somebloghost.com/Blogs/2006/12/10/. The conventional way of parameter extraction and de-duplication is no longer applicable here. Even further, URLs such as https:// www.oschina.net/news/74686/chandao-8-2-3 are more difficult to identify which are variable valid parameters. When risk evaluation is carried out on a large Internet site (known as Beijing east) using a URL rewriting technology, because the site scale is huge, each URL path is different, if the URL cannot be effectively deduplicated, scanning can become a large amount of repeated, redundant and inefficient work, even the complete work is impossible; if the parameter extraction cannot be performed efficiently, the scanning becomes meaningless work with no target and extremely low accuracy.
In short, the prior art cannot accurately extract the parameters of the rewritten URL.
Disclosure of Invention
The invention provides a URL detection method and a detection device, which are used for solving the problem that in the prior art, the detection efficiency is low because the rewritten URL cannot be accurately de-duplicated.
The embodiment of the invention provides a Uniform Resource Locator (URL) detection method, which comprises the following steps:
acquiring path information of a URL to be detected; the URL to be detected is the URL after rewriting processing;
abstracting the path information of the URL to be detected to obtain a native mode of the URL to be detected;
determining a URL family to which the URL to be detected belongs according to the native mode of the URL to be detected; the URL family has the same native pattern as the URL to be detected;
merging the URL to be detected into a URL family;
selecting a representative mode of the URL to be detected from the URL family;
and extracting parameters in the URL to be detected according to the representative mode for detection, and taking the detection result of the URL to be detected as the detection result of the representative mode.
Optionally, the abstracting the path information of the URL to be detected to obtain the native mode of the URL to be detected includes:
converting non-special characters in the URL to be detected into letters or numbers through regular replacement; the non-special characters comprise characters without separation in the URL to be detected;
and abstracting the character strings in each separator in the URL to be detected after the regular replacement processing according to a set rule.
Optionally, the process of abstracting the character string in each delimiter in the URL to be detected after the regular replacement processing according to a set rule includes:
if the character string in the separator is an alphabetic character string, abstracting the alphabetic character string into a first identifier;
if the character string in the separator is a numeric character string, abstracting the numeric character string into a second identifier;
if the character string in the separator is a character string composed of numbers and letters, the third identifier is abstracted.
Optionally, incorporating the URL to be detected into the URL family includes:
the URL family is a mode tree which is constructed by all URLs in the URL family in a layer-by-layer abstract mode;
acquiring N historical URLs from a URL family, wherein N is a positive integer;
for each history URL, the following processing is performed:
comparing the URL to be detected with the historical URL pairwise to obtain the difference between the URL to be detected and the historical URL;
abstracting the difference between the URL to be detected and the historical URL layer by layer so as to construct a sub-mode tree between the URL to be detected and the historical URL;
the sub pattern tree is incorporated into the URL family.
Optionally, the sub pattern tree is incorporated into the URL family, including: sequentially comparing the nodes of the sub-mode tree with the nodes of the URL family mode tree according to the sequence from shallow to deep; the sub-pattern tree nodes are URL patterns in the sub-pattern tree, and the URL family pattern tree nodes are URL patterns in the URL family pattern tree;
if the node of the sub-pattern tree and the node of the URL family pattern tree have no inclusion or included relationship, directly combining the node of the sub-pattern tree and the node of the URL pattern tree;
if the node of the sub-pattern tree contains the node of the URL family pattern tree, comparing the sub-level node of the sub-pattern tree under the node with the node of the URL family until the sub-pattern tree is merged into the URL family pattern tree;
if the node of the URL family pattern tree contains a node of the sub pattern tree, the sub level node of the URL family pattern tree is compared with the node of the sub pattern tree until the sub pattern tree is merged into the URL family pattern tree.
Optionally, selecting a representative pattern of the URL to be detected from the URL family includes:
traversing the pattern tree of the URL family layer by layer from deep to shallow from the URL to be detected;
and when the pattern nodes with the number of the URLs without abstract processing exceeding a preset threshold exist in the pattern tree, the pattern nodes are the representative patterns of the URLs to be detected.
Optionally, the method includes:
and if the representative mode is detected, taking the detection result of the representative mode as the detection result of the URL to be detected.
Optionally, the method further includes:
if the existing URL families do not have the URL family with the same native pattern as the URL to be detected, storing the native pattern of the URL to be detected as a newly added URL family;
and detecting the URL to be detected.
The embodiment of the invention provides a Uniform Resource Locator (URL) detection device, which comprises:
the acquisition module is used for acquiring the path information of the URL to be detected; the URL to be detected is the URL after rewriting processing;
the abstraction module is used for abstracting the path information of the URL to be detected and acquiring a native mode of the URL to be detected;
the query module is used for determining the URL family to which the URL to be detected belongs according to the native mode of the URL to be detected; the URL family has the same native pattern as the URL to be detected;
the merging module is used for merging the URL to be detected into a URL family;
the extraction module is used for selecting a representative mode of the URL to be detected from the URL family;
and the processing module is used for extracting the parameters in the URL to be detected according to the representative mode for detection, and taking the detection result of the URL to be detected as the detection result of the representative mode.
Optionally, the abstraction module is specifically configured to:
converting non-special characters in the URL to be detected into letters or numbers through regular replacement; the non-special characters comprise characters without separation in the URL to be detected;
and abstracting the character strings in each separator in the URL to be detected after the regular replacement processing according to a set rule.
Optionally, the abstraction module is specifically configured to:
if the character string in the separator is an alphabetic character string, abstracting the alphabetic character string into a first identifier;
if the character string in the separator is a numeric character string, abstracting the numeric character string into a second identifier;
if the character string in the separator is a character string composed of numbers and letters, the third identifier is abstracted.
Optionally, the URL family is a pattern tree constructed by URLs in the URL family in a layer-by-layer abstract manner;
the merging module is specifically configured to:
acquiring N historical URLs from a URL family, wherein N is a positive integer;
for each history URL, the following processing is performed:
comparing the URL to be detected with the historical URL pairwise to obtain the difference between the URL to be detected and the historical URL;
abstracting the difference between the URL to be detected and the historical URL layer by layer so as to construct a sub-mode tree between the URL to be detected and the historical URL;
the sub pattern tree is incorporated into the URL family.
Optionally, the merging module is specifically configured to:
sequentially comparing the nodes of the sub-mode tree with the nodes of the URL family mode tree according to the sequence from shallow to deep; the sub-pattern tree nodes are URL patterns in the sub-pattern tree, and the URL family pattern tree nodes are URL patterns in the URL family pattern tree;
if the node of the sub-pattern tree and the node of the URL family pattern tree have no inclusion or included relationship, directly combining the node of the sub-pattern tree and the node of the URL pattern tree;
if the node of the sub-pattern tree contains the node of the URL family pattern tree, comparing the sub-level node of the sub-pattern tree under the node with the node of the URL family until the sub-pattern tree is merged into the URL family pattern tree;
if the node of the URL family pattern tree contains a node of the sub pattern tree, the sub level node of the URL family pattern tree is compared with the node of the sub pattern tree until the sub pattern tree is merged into the URL family pattern tree.
Optionally, the extraction module is specifically configured to:
traversing the pattern tree of the URL family layer by layer from deep to shallow from the URL to be detected;
and when the pattern nodes with the number of the URLs without abstract processing exceeding a preset threshold exist in the pattern tree, the pattern nodes are the representative patterns of the URLs to be detected.
Optionally, the processing module is specifically configured to:
and if the representative mode is detected, taking the detection result of the representative mode as the detection result of the URL to be detected.
Optionally, the processing module is further configured to:
if the existing URL families do not have the URL family with the same native pattern as the URL to be detected, storing the native pattern of the URL to be detected as a newly added URL family;
and detecting the URL to be detected.
In summary, an embodiment of the present invention provides a URL detection method and a detection apparatus, including: acquiring path information of a URL to be detected; the URL to be detected is the URL after rewriting processing; abstracting the path information of the URL to be detected to obtain a native mode of the URL to be detected; determining a URL family to which the URL to be detected belongs according to the native mode of the URL to be detected; the URL family has the same native pattern as the URL to be detected; merging the URL to be detected into a URL family; selecting a representative mode of the URL to be detected from the URL family; and extracting parameters in the URL to be detected according to the representative mode for detection, and taking the detection result of the URL to be detected as the detection result of the representative mode. The method comprises the steps of abstracting path information of a URL to be detected to obtain a native mode of the URL to be detected, wherein the native mode is a rough characteristic of the URL to be detected, the native mode is merged into a URL family of the URL to be detected according to the native mode, the URL in the URL family and the URL to be detected have the same rough characteristic, in the URL family, the rough characteristic is specifically subdivided under the native mode, a plurality of URL mode branches are obtained, one URL mode, namely the structural characteristic of a representative mode, can represent the structural characteristic of the URL to be detected, and the characteristic part of the URL to be detected in the representative mode is abstracted.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic flowchart of a URL detection method according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a pattern tree merge according to an embodiment of the present invention;
FIG. 3 is a second illustration of a pattern tree merge provided by the present invention;
fig. 4 is a schematic structural diagram of a detection apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flowchart of a URL detection method according to an embodiment of the present invention, as shown in fig. 1, including the following steps:
s101: acquiring path information of a URL to be detected; the URL to be detected is the URL after rewriting processing;
s102: abstracting the path information of the URL to be detected to obtain a native mode of the URL to be detected;
s103: determining a URL family to which the URL to be detected belongs according to the native mode of the URL to be detected; the URL family has the same native pattern as the URL to be detected;
s104: merging the URL to be detected into a URL family;
s105: selecting a representative mode of the URL to be detected from the URL family;
s106: and extracting parameters in the URL to be detected according to the representative mode for detection, and taking the detection result of the URL to be detected as the detection result of the representative mode.
In a specific implementation process, the URL to be detected in the embodiment of the present invention is a URL subjected to URL rewriting processing. In the embodiment of the invention, the URL to be detected is any one of the mass URLs to be detected, and after the detection of the URL to be detected is completed, another undetected URL is extracted from the mass URLs and used as the URL to be detected to repeat the steps.
The rewritten URL can be decomposed into a protocol name (schema), an address field (host), a port (port), and a path (path). It should be noted that the embodiment of the present invention is premised on that the URLs have the same schema, host, and port information, that is, the URLs having the same schema, host, and port information are used as a processing set, and the detection method disclosed in the embodiment of the present invention is used for detecting the URLs in the same processing set. Therefore, before detecting the URL, the URL needs to be diverged according to the schema, host, and port information of the URL, and generally, the URLs from the same source have the same schema, host, and port information, so that the URL may be diverged directly according to the source of the URL, and then the method disclosed in the embodiment of the present invention is used for detecting each URL diversionary.
In the specific implementation process of S101, the path information of the URL to be detected includes the parameter and parameter value information of the URL to be detected, and therefore, the path information is a main object processed in the embodiment of the present invention. The extraction of the path information may be performed according to an existing path recognition method.
In the specific implementation process of S102, the path information of the URL to be detected is abstracted, that is, the specific content in the path information of the URL to be detected is abstracted, and preferably, the letters and numbers in the path and the combination thereof are distinguished by using different identifiers, that is, the structural features of the path information are abstracted.
In the implementation of S103, URLs with the same native patterns that have been detected constitute the URL family to which these URLs belong. Since there are many detected URLs, they have different native patterns, and therefore there are many URL families. And traversing each URL family after acquiring the native pattern of the URL to be detected, wherein if a certain URL family has the same native pattern as the URL to be detected, the URL family becomes the URL family to which the URL to be detected belongs. Preferably, if the existing URL families do not have the URL family with the same native pattern as the URL to be detected, storing the native pattern of the URL to be detected as a newly added URL family; and detecting the URL to be detected.
In the specific implementation process of S104, the URL family here is the URL family to which the URL to be detected belongs, and both of the URL family and the URL family have the same native pattern, so the URL to be detected can be incorporated into the URL family to which the URL belongs after a certain degree of abstraction. Preferably, the historical URL is extracted from the URL family and compared with the URL to be detected, the historical URL and the URL to be detected are abstracted layer by layer according to local difference between the historical URL and the URL to be detected until the same mode is found, and then the URL to be detected and a plurality of intermediate modes generated in the layer by layer abstraction process are merged into the URL family.
In the specific implementation process of S105, the URL family processed in S104 includes a plurality of URLs and URL patterns, and the representative pattern of the URL to be detected needs to be selected from the URL patterns incorporated in the URL family in S104. Optionally, the selection of the representative mode may be determined according to the depth of the URL mode and the number of leaf nodes in the URL mode, where the depth of the URL mode refers to the number of layers of the URL mode generated in the process of representing the URL to be detected layer by layer from the native mode, and the higher the number of layers of the representation, the deeper the URL mode depth; the number of leaf nodes in the URL pattern refers to the number of specific URLs in the URL pattern.
In the specific implementation process of S106, the structural feature of the representative pattern may represent the structural feature of the URL to be detected, and the feature portion of the URL to be detected in the representative pattern is abstracted, so that when extracting the parameter, the parameter extraction is performed only according to the abstracted portion of the URL to be detected, thereby improving the accuracy of parameter extraction, and further improving the accuracy of URL detection, for example, if the path of the URL to be detected is/10882/, and the representative pattern thereof is/1088 d/, the extracted parameter is 2, and if the representative pattern thereof is/108 dd/, the extracted parameter is 82.
Preferably, if the representative pattern is detected, the detection result of the representative pattern is used as the detection result of the URL to be detected. The detection result of the representative mode represents the detection results of all the leaf node URLs in the URL mode, and the detection result of the representative mode is determined by the detection result of one of the leaf node URLs. Optionally, if the representative mode has no detection result, it is indicated that the leaf node URL in the representative mode has not been detected yet, at this time, the URL to be detected is detected, and the detection result of the URL to be detected is used as the detection result of the URL mode, so that deduplication of the URL can be realized, and the URL detection efficiency is improved.
The method comprises the steps of abstracting path information of a URL to be detected to obtain a native mode of the URL to be detected, wherein the native mode is a rough characteristic of the URL to be detected, the native mode is merged into a URL family of the URL to be detected according to the native mode, the URL in the URL family and the URL to be detected have the same rough characteristic, in the URL family, the rough characteristic is specifically subdivided under the native mode, a plurality of URL mode branches are obtained, one URL mode, namely a detection result of a representative mode, can represent the detection result of the URL to be detected, if the representative mode is detected in advance, the URL to be detected does not need to be detected, the URL detection process is simplified, and the detection efficiency is improved. In addition, the method can be carried out in real time, and a detection model does not need to be established first, so that the detection process is simpler and more convenient.
The embodiment of the invention provides a feasible implementation method for acquiring a native pattern of a URL to be detected, which comprises the following steps: converting non-special characters in the URL to be detected into letters or numbers through regular replacement; the non-special characters comprise characters without separation in the URL to be detected; and abstracting the character strings in each separator in the URL to be detected after the regular replacement processing according to a set rule. The components of the URL path information include letters, numbers and characters, wherein part of the characters have a separating function, namely special characters, and the rest characters are the information components of the URL path information. Preferably. The non-special characters in the URL to be detected are converted into letters or numbers, and only the separation characters are reserved, so that the structural characteristics of the URL can be more obvious, the difficulty in identifying the URL path information is reduced, and the URL can be more easily abstracted.
Watch 1
Regularization Replacement of
'[%\+!\[\];&]' '0v0'
'(?<=\w)\.(?![a-zA-Z]+$)' 'v'
'(?<=[a-zA-Z]-)(\w+-)+(?=\d+$)' 'v'
'(?<=[a-zA-Z])(\d*[-_]\d*)+(?=[a-zA-Z])' '0'
Table one is a regular replacement rule provided in the embodiment of the present invention, as shown in table one, part of characters in the character string are replaced by a number 0 or a letter v, and part of characters are replaced by a number 0 and a letter v, because the part of unspecific characters will be abstracted as a third identifier, which is the same as a mixed character string of numbers and letters, so that the replacement with 0 and v is convenient for subsequent unified processing. It should be noted that 0 and v are only indicated by a number and a letter, and may be any letters and numbers such as 1 and u, 2 and b, etc. during the actual use.
After the regular replacement processing, the URL path information to be detected only comprises letters, numbers and separators, the URL path information to be detected is divided into a plurality of character strings according to the separators, and each character string is abstracted.
More specifically, an embodiment of the present invention provides a method for performing abstraction processing on a character string in each delimiter in a to-be-detected URL after regular replacement processing according to a set rule, where the method includes:
if the character string in the separator is an alphabetic character string, abstracting the alphabetic character string into a first identifier; if the character string in the separator is a numeric character string, abstracting the numeric character string into a second identifier; if the character string in the separator is a character string composed of numbers and letters, the third identifier is abstracted. The alphabetic character string means a character string composed of alphabetic characters, and similarly, the numeric character string means a character string composed of numeric characters, and the character string composed of numeric and alphabetic characters means that both numeric and alphabetic characters exist among a plurality of characters constituting the character string. The method is characterized in that different marks are adopted for abstracting characters with different compositions, for example, "\ c" is adopted to represent alphabetical characters, "\ d" is adopted to represent numeric characters, "\\ w" is adopted to represent numeric characters or alphabetical characters, furthermore, "+" is adopted to represent a plurality of continuous same characters, and the "+" is respectively combined with the "\\\ c" "d" "w" to respectively form a first mark, a second mark and a third mark, for example, a character string "abcderf" can be abstracted as "\ c +". Abstracting each character string in the URL path information to be detected, and finally obtaining the native mode of the URL to be detected. Table two provides an example of a native pattern for the embodiment of the present invention, and as shown in table two, path information of four different URLs to be detected is abstracted as two native patterns,/reach 296/p/3816387.html and/reach 296/p/4001918.html is abstracted as/\\ w +/\ c +/\ d + \\ c +,/reach 296/wlwmanitest. xml and/reach 296/default. html is abstracted as/\\ w + \\ c +. Generally, URLs with the same native pattern will have the same processing logic and can therefore be treated as a same family, as in Table two,/reach 296/p/3816387.html and/reach 296/p/4001918.html can be treated as a same family, and/reach 296/wlwmanitest. xml and/reach 296/default. html can be treated as another family.
Watch two
Figure BDA0001233830750000111
Figure BDA0001233830750000121
After the native pattern of the URL to be detected is obtained, the URL to be detected needs to be merged into the URL family to which the URL to be detected belongs according to the native pattern of the URL to be detected, where the URL family is a pattern tree constructed by the URLs in the URL family in a layer-by-layer abstract manner. The embodiment of the invention provides a method for incorporating a URL to be detected into a URL family, which comprises the following steps: acquiring N historical URLs from a URL family, wherein N is a positive integer; for each history URL, the following processing is performed: comparing the URL to be detected with the historical URL pairwise to obtain the difference between the URL to be detected and the historical URL; abstracting the difference between the URL to be detected and the historical URL layer by layer so as to construct a sub-mode tree between the URL to be detected and the historical URL; incorporating a sub-pattern tree into the URL family. Preferably, the N historical URLs extracted from the URL family belong to different deepest URL patterns, where the deepest URL pattern is the deepest URL pattern on each branch in the URL tree. Alternatively, the latest top N history URLs may be extracted from the URL family for processing. Theoretically, the most perfect situation is that historical URLs in a URL family are compared with URLs to be detected pairwise, but the comparison is too complicated for large sites, and the extracted N historical URLs are required to be representative as much as possible no matter extracted according to URL modes or extracted according to URL processing time, so that the detection accuracy is guaranteed, and the processing speed is increased.
And processing each historical URL, wherein in the concrete implementation process, character comparison is carried out between the URL to be detected and the historical URL, and multi-layer abstraction is carried out on the difference. Table three is an example of a comparison result identification method provided by the embodiment of the present invention, different characters are used to represent different comparison results, as shown in table three, segment 1 and segment 2 respectively represent character string segments in path information of a URL to be detected and a history URL, if a character exists only in segment 1, the character is identified by '-' and if the character exists only in segment 2, the character is identified by '+' and if the character exists in both segment 1 and segment 2, the character is identified by ''.
Watch III
Character(s) Means of
′-′ Present only in fragment 1
'+' Present only in fragment 2
′′ Present in both fragments 1 and 2
After the initial comparison result of the URL to be detected and the historical URL is obtained, the difference between the URL to be detected and the historical URL is abstracted for the first time, wherein the purpose of the abstraction is to represent the letter of the difference part by 'c' and the number by'd'. Optionally, the above processing is completed by using a regular replacement, and table four is a regular replacement rule provided in the embodiment of the present invention.
Watch four
Regularization Replacement of
1 '[a-zA-Z\+]|%[0-9A-Z]{2}' '\c'
1 '\d' '\d'
2 '(?<!\\)[a-zA-Z]' '\c'
2 '\d' '\d'
3 '(\\d)+' '\d+'
3 '(\\c)+' '\c+'
4 '[^/]+' '\w+'
After the first abstraction, continuing to abstract until the URL to be detected and the historical URL are abstracted into the same URL mode, wherein at the moment, the URL to be detected, the historical URL and URL modes generated by multiple abstractions form a sub-mode tree of the URL to be detected and the historical URL together. Since the historical URL and the URL to be detected are from the same URL family, the historical URL and the URL to be detected can finally obtain the same URL mode, that is, the sub-mode tree with the same root node can be generated with certainty. Taking URL path/cat/108705/and/cat/108709/as examples, the differences between them may be preliminarily identified as: if only two numbers of 5 and 9 are different, the two URLs are abstracted for the first time to obtain/cate/10870 \ d/, and the two URLs obtain the same URL pattern after the first abstraction, and at this time, the sub-pattern tree is:
Figure BDA0001233830750000141
for another example, URL path/pick/1/and/cat/12/are taken as examples, and the sub-pattern tree obtained after the two are compared and abstracted is:
Figure BDA0001233830750000142
and aiming at each historical URL, after acquiring the URL to be detected and the sub-pattern tree of the historical URL, merging the sub-pattern tree into the pattern tree of the URL family. Preferably, a regular replacement is used to compare nodes in the sub-pattern tree and the URL family pattern tree, and table five is a regular replacement comparison rule provided by the embodiment of the present invention, as shown in table five, the a node is a URL pattern in the sub-pattern tree, the B node is a URL pattern in the URL family pattern tree, when the expression of the a node completely contains the expression of the B node, the comparison result is 1, for example, the expression of the a byte is '/cate/\ d +/', the expression of the B byte is '/cate/1087\ d \ d/', "\\ d +" represents a character string with multiple digits, "1087 \ d \ d" represents that the first four digits are fixed as 1087, and the last two digits are numbers, obviously, '/cate \ d +/' contains '/e/1087 \ d/', therefore, the comparison result is 1, and correspondingly, when the expression of the node B completely contains the expression of the node a, the comparison result is-1, and when the expression of the node a cannot contain the expression of the node B, and the expression of the node B cannot contain the expression of the node a, the comparison result is 0.
Watch five
Figure BDA0001233830750000151
More specifically, when comparing the a node expression with the B node expression, the a node expression may be translated into a regular expression, and the B node expression may be translated into a common character string and then matched with the regular expression obtained by translating the a node expression. When translating the node B expression, it is preferable to translate an expression into at least two character strings, and if both are matched, the matching is calculated to be successful, for example, node B expression/cat/\ d/can be translated into/cat/0/and/cat/9/, and both are matched by the regular expression translated by the node a expression, and then node a is calculated to include node B. Table six is an example of a URL pattern expression translation rule provided in the embodiment of the present invention.
Watch six
Figure BDA0001233830750000161
More specifically, an embodiment of the present invention provides a pattern tree merging method, including:
sequentially comparing the nodes of the sub-mode tree with the nodes of the URL family mode tree according to the sequence from shallow to deep; the sub-pattern tree nodes are URL patterns in the sub-pattern tree, and the URL family pattern tree nodes are URL patterns in the URL family pattern tree; if the node of the sub-pattern tree and the node of the URL family pattern tree have no inclusion or included relationship, directly combining the node of the sub-pattern tree and the node of the URL pattern tree; if the node of the sub-pattern tree contains the node of the URL family pattern tree, comparing the sub-level node of the sub-pattern tree under the node with the node of the URL family until the sub-pattern tree is merged into the URL family pattern tree; if the node of the URL family pattern tree contains a node of the sub pattern tree, the sub level node of the URL family pattern tree is compared with the node of the sub pattern tree until the sub pattern tree is merged into the URL family pattern tree.
For the merging of sub-pattern trees and URL family pattern trees, which is really a recursive operation, there are three possible cases:
case 1: and comparing the two pattern trees from the root node, and directly merging the sub-pattern tree and the URL family pattern tree when the comparison result of the root node is 0. Fig. 2 is a schematic diagram of merging mode trees according to an embodiment of the present invention, and as shown in fig. 2, root nodes of two mode trees are directly merged.
Case 2: and when the root node comparison result is-1, taking the sub-level pattern tree in the URL family pattern tree as a new pattern tree to be compared, comparing the new pattern tree with the sub-pattern tree again, and repeating the above steps until the sub-pattern tree is merged into the URL family pattern tree, where fig. 3 is a second pattern tree merging diagram provided in the embodiment of the present invention, as shown in fig. 3, the root node of the sub-pattern tree finally becomes a URL pattern node in the URL family pattern tree, and the sub-pattern tree is merged into the URL family pattern tree as a sub-level pattern tree of the URL family pattern.
Case 3: and when the comparison result of the root node is 1, exchanging the comparison relation between the sub-pattern tree and the URL family pattern tree for repeating the steps in the case 2, namely, taking the sub-level pattern tree in the sub-pattern tree as a new pattern tree to be compared, comparing the new pattern tree with the URL family pattern tree again, and repeating the steps until the sub-pattern tree and the URL family pattern tree are combined.
The embodiment of the invention provides a specific example for combining a sub-pattern tree and a URL family pattern tree, wherein the sub-pattern tree comprises the following steps:
Figure BDA0001233830750000181
after the two are combined, the obtained new URL family pattern tree is as follows:
Figure BDA0001233830750000182
Figure BDA0001233830750000191
and after the N sub-pattern trees obtained by pairwise comparison of the URL to be detected and the N historical URLs are all merged into the URL family pattern tree, extracting the representative pattern of the URL to be detected from the URL family pattern. Specifically, an embodiment of the present invention provides a method for determining a representative pattern of a to-be-detected URL, including: traversing the pattern tree of the URL family layer by layer from deep to shallow from the URL to be detected; and when the pattern nodes with the number of the URLs without abstract processing exceeding a preset threshold exist in the pattern tree, the pattern nodes are the representative patterns of the URLs to be detected. The embodiment of the present invention is described by taking the new URL family pattern tree obtained by merging the sub-pattern tree and the URL family pattern tree given in the above embodiment as an example, it should be noted that the new URL family pattern tree is not the URL family pattern tree for extracting the representative pattern in the actual application process, and the URL family pattern tree in the actual use process is the URL family pattern tree after merging N sub-pattern trees. Assuming that the URL to be detected is/cate/108709/, in the new URL family pattern tree, the corresponding URL patterns are arranged from deep to shallow according to the hierarchy, and the number of URLs without abstract processing in each URL pattern is:
Figure BDA0001233830750000201
it can be seen that a URL to be detected has URL patterns of different levels, and preferably, the representative pattern is the deepest pattern with the same pattern and the number of unirastered URLs reaching the preset threshold. For example, when the preset threshold is 2, the representative mode is/cat/10870 \ d/, and for example, when the preset threshold is 3, the representative mode is/\ c +/\ d +/. Alternatively, the preset threshold may be set empirically or in actual circumstances, or may be obtained by logic calculation. Preferably, an embodiment of the present invention provides a method for calculating a preset threshold T, as shown in formula one:
t ═ max (3, number of URLs/max (family mode tree depth, 1)) (equation one),
as shown in equation one, Tmin cannot be less than 3.
After the representative mode is determined, it is further determined whether the representative mode has been detected. If the history URL in the representative mode is detected, the representative mode is considered to be detected. And when the representative mode is detected, taking the detection result of the representative mode as the detection result of the URL to be detected. And if the representative mode is not detected, detecting the URL to be detected. Table seven is an example of detecting a URL to be detected provided by the embodiment of the present invention, as shown in table seven, for the URL/cate/108709/, when the representation mode is/cate/10870 d, the extracted parameter is 9, and a payload is constructed according to the parameter 9 to perform URL detection; when the representation mode is/cat/\ d \ d \ d \ d, the extracted parameter is 108709, and payload is constructed according to the parameter 108709 for URL detection; when the representation mode is/\\ c \ c \ c \ c/\\ d \ d \ d \ d, the extracted parameters are cat and 108709, and payload is constructed according to the parameters cat and 108709 for URL detection. Furthermore, after the URL to be detected is detected, the detection result of the URL to be detected is used as the detection result of the representative mode.
Watch seven
Figure BDA0001233830750000211
In summary, an embodiment of the present invention provides a URL detection method, including: acquiring path information of a URL to be detected; the URL to be detected is the URL after rewriting processing; abstracting the path information of the URL to be detected to obtain a native mode of the URL to be detected; determining a URL family to which the URL to be detected belongs according to the native mode of the URL to be detected; the URL family has the same native pattern as the URL to be detected; merging the URL to be detected into a URL family; selecting a representative mode of the URL to be detected from the URL family; and extracting parameters in the URL to be detected according to the representative mode for detection, and taking the detection result of the URL to be detected as the detection result of the representative mode. The method comprises the steps of abstracting path information of a URL to be detected to obtain a native mode of the URL to be detected, wherein the native mode is a rough characteristic of the URL to be detected, the native mode is merged into a URL family of the URL to be detected according to the native mode, the URL in the URL family and the URL to be detected have the same rough characteristic, in the URL family, the rough characteristic is specifically subdivided under the native mode, a plurality of URL mode branches are obtained, one URL mode, namely the structural characteristic of a representative mode, can represent the structural characteristic of the URL to be detected, and the characteristic part of the URL to be detected in the representative mode is abstracted.
Based on the same technical concept, the embodiment of the invention also provides a detection device, and the detection device can execute the method embodiment. Fig. 4 is a schematic structural diagram of a detection apparatus according to an embodiment of the present invention, and as shown in fig. 4, the detection apparatus 400 includes:
an obtaining module 401, configured to obtain path information of a to-be-detected URL; the URL to be detected is the URL after rewriting processing;
the abstraction module 402 is configured to abstract the path information of the URL to be detected, and obtain a native pattern of the URL to be detected;
the query module 403 is configured to determine, according to the native pattern of the URL to be detected, a URL family to which the URL to be detected belongs; the URL family has the same native pattern as the URL to be detected;
a merging module 404, configured to merge the URL to be detected into a URL family;
an extracting module 405, configured to select a representative pattern of the URL to be detected from the URL family;
and the processing module 406 is configured to extract parameters in the URL to be detected according to the representative pattern for detection, and use a detection result of the URL to be detected as a detection result of the representative pattern.
Optionally, the abstraction module 402 is specifically configured to:
converting non-special characters in the URL to be detected into letters or numbers through regular replacement; the non-special characters comprise characters without separation in the URL to be detected;
and abstracting the character strings in each separator in the URL to be detected after the regular replacement processing according to a set rule.
Optionally, the abstraction module 402 is specifically configured to:
if the character string in the separator is an alphabetic character string, abstracting the alphabetic character string into a first identifier;
if the character string in the separator is a numeric character string, abstracting the numeric character string into a second identifier;
if the character string in the separator is a character string composed of numbers and letters, the third identifier is abstracted.
Optionally, the URL family is a pattern tree constructed by URLs in the URL family in a layer-by-layer abstract manner;
the merging module 404 is specifically configured to:
acquiring N historical URLs from a URL family, wherein N is a positive integer;
for each history URL, the following processing is performed:
comparing the URL to be detected with the historical URL pairwise to obtain the difference between the URL to be detected and the historical URL;
abstracting the difference between the URL to be detected and the historical URL layer by layer so as to construct a sub-mode tree between the URL to be detected and the historical URL;
the sub pattern tree is incorporated into the URL family.
Optionally, the extracting module 405 is specifically configured to:
traversing the pattern tree of the URL family layer by layer from deep to shallow from the URL to be detected;
and when the pattern nodes with the number of the URLs without abstract processing exceeding a preset threshold exist in the pattern tree, the pattern nodes are the representative patterns of the URLs to be detected.
Optionally, the processing module 406 is further configured to:
and if the representative mode is detected, taking the detection result of the representative mode as the detection result of the URL to be detected.
Optionally, the processing module 406 is further configured to:
if the existing URL families do not have the URL family with the same native pattern as the URL to be detected, storing the native pattern of the URL to be detected as a newly added URL family;
and detecting the URL to be detected.
In summary, an embodiment of the present invention provides a method and an apparatus for detecting a URL, including: acquiring path information of a URL to be detected; the URL to be detected is the URL after rewriting processing; abstracting the path information of the URL to be detected to obtain a native mode of the URL to be detected; determining a URL family to which the URL to be detected belongs according to the native mode of the URL to be detected; the URL family has the same native pattern as the URL to be detected; merging the URL to be detected into a URL family; selecting a representative mode of the URL to be detected from the URL family; and extracting parameters in the URL to be detected according to the representative mode for detection, and taking the detection result of the URL to be detected as the detection result of the representative mode. The method comprises the steps of abstracting path information of a URL to be detected to obtain a native mode of the URL to be detected, wherein the native mode is a rough characteristic of the URL to be detected, the native mode is merged into a URL family of the URL to be detected according to the native mode, the URL in the URL family and the URL to be detected have the same rough characteristic, in the URL family, the rough characteristic is specifically subdivided under the native mode, a plurality of URL mode branches are obtained, one URL mode, namely the structural characteristic of a representative mode, can represent the structural characteristic of the URL to be detected, and the characteristic part of the URL to be detected in the representative mode is abstracted.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (14)

1. A method for detecting a Uniform Resource Locator (URL), comprising:
acquiring path information of a URL to be detected; the URL to be detected is the rewritten URL;
abstracting the path information of the URL to be detected to obtain a native mode of the URL to be detected;
determining a URL family to which the URL to be detected belongs according to the native mode of the URL to be detected; the URL family has the same native pattern as the URL to be detected;
merging the URL to be detected into the URL family;
selecting a representative mode of the URL to be detected from the URL family;
extracting parameters in the URL to be detected according to the representative mode for detection, and taking a detection result of the URL to be detected as a detection result of the representative mode;
and if the representative mode is detected, taking the detection result of the representative mode as the detection result of the URL to be detected.
2. The method of claim 1, wherein abstracting path information of the URL to be detected to obtain a native pattern of the URL to be detected comprises:
converting the non-special characters in the URL to be detected into letters or numbers through regular replacement; the non-special characters comprise characters without separation in the URL to be detected;
and abstracting the character strings in each separator in the URL to be detected after the regular replacement processing according to a set rule.
3. The method according to claim 2, wherein abstracting the character string in each delimiter of the URL to be detected after the regular replacement processing according to a set rule includes:
if the character string in the separator is an alphabetic character string, abstracting the alphabetic character string into a first identifier;
if the character string in the separator is a numeric character string, abstracting the numeric character string into a second identifier;
if the character string in the separator is a character string composed of numbers and letters, the third identifier is abstracted.
4. The method of claim 1, wherein incorporating the URL to be detected into the family of URLs comprises:
the URL family is a mode tree which is constructed by URLs in the URL family in a layer-by-layer abstract mode;
obtaining N historical URLs from the URL family, wherein N is a positive integer;
for each history URL, the following processing is performed:
comparing the URL to be detected with the historical URL pairwise to obtain the difference between the URL to be detected and the historical URL;
abstracting the difference between the URL to be detected and the historical URL layer by layer so as to construct a sub-mode tree between the URL to be detected and the historical URL;
incorporating the sub-pattern tree into the URL family.
5. The method of claim 4, wherein incorporating the sub-pattern tree into the URL family comprises:
sequentially comparing the sub-mode tree nodes with the nodes of the URL family mode tree according to the sequence from shallow to deep; the sub-pattern tree nodes are URL patterns in the sub-pattern tree, and the URL family pattern tree nodes are URL patterns in the URL family pattern tree;
if the node of the sub-pattern tree and the node of the URL family pattern tree have no inclusion or included relationship, directly merging the node of the sub-pattern tree and the node of the URL pattern tree;
if the node of the sub-pattern tree comprises the node of the URL family pattern tree, comparing the sub-level node of the sub-pattern tree under the node with the node of the URL family until the sub-pattern tree is merged into the URL family pattern tree;
if the node of the URL family pattern tree comprises the node of the sub-pattern tree, comparing the sub-level node of the URL family pattern tree with the node of the sub-pattern tree until the sub-pattern tree is merged into the URL family pattern tree.
6. The method of claim 4, wherein selecting the representative pattern of the URL to be detected from the family of URLs comprises:
traversing the pattern tree of the URL family layer by layer from deep to shallow from the URL to be detected;
and when the mode tree has mode nodes with the number of the URLs without abstract processing exceeding a preset threshold, the mode nodes are the representative modes of the URLs to be detected.
7. The method of any of claims 1 to 6, further comprising:
if the existing URL families do not have the URL family with the same native pattern as the URL to be detected, storing the native pattern of the URL to be detected as a newly added URL family;
and detecting the URL to be detected.
8. An apparatus for URL detection, comprising:
the acquisition module is used for acquiring the path information of the URL to be detected; the URL to be detected is the rewritten URL;
the abstraction module is used for abstracting the path information of the URL to be detected and acquiring a native mode of the URL to be detected;
the query module is used for determining a URL family to which the URL to be detected belongs according to the native mode of the URL to be detected; the URL family has the same native pattern as the URL to be detected;
the merging module is used for merging the URL to be detected into the URL family;
the extraction module is used for selecting the representative mode of the URL to be detected from the URL family;
the processing module is used for extracting parameters in the URL to be detected according to the representative mode for detection, and taking a detection result of the URL to be detected as a detection result of the representative mode;
the processing module is further configured to:
and if the representative mode is detected, taking the detection result of the representative mode as the detection result of the URL to be detected.
9. The apparatus of claim 8, comprising:
the abstraction module is specifically configured to:
converting the non-special characters in the URL to be detected into letters or numbers through regular replacement; the non-special characters comprise characters without separation in the URL to be detected;
and abstracting the character strings in each separator in the URL to be detected after the regular replacement processing according to a set rule.
10. The apparatus of claim 9, comprising:
the abstraction module is specifically configured to:
if the character string in the separator is an alphabetic character string, abstracting the alphabetic character string into a first identifier;
if the character string in the separator is a numeric character string, abstracting the numeric character string into a second identifier;
if the character string in the separator is a character string composed of numbers and letters, the third identifier is abstracted.
11. The apparatus of claim 8, comprising:
the URL family is a mode tree which is constructed by URLs in the URL family in a layer-by-layer abstract mode;
the merging module is specifically configured to:
obtaining N historical URLs from the URL family, wherein N is a positive integer;
for each history URL, the following processing is performed:
comparing the URL to be detected with the historical URL pairwise to obtain the difference between the URL to be detected and the historical URL;
abstracting the difference between the URL to be detected and the historical URL layer by layer so as to construct a sub-mode tree between the URL to be detected and the historical URL;
incorporating the sub-pattern tree into the URL family.
12. The apparatus of claim 11, comprising:
the merging module is specifically configured to:
sequentially comparing the sub-mode tree nodes with the nodes of the URL family mode tree according to the sequence from shallow to deep; the sub-pattern tree nodes are URL patterns in the sub-pattern tree, and the URL family pattern tree nodes are URL patterns in the URL family pattern tree;
if the node of the sub-pattern tree and the node of the URL family pattern tree have no inclusion or included relationship, directly merging the node of the sub-pattern tree and the node of the URL pattern tree;
if the node of the sub-pattern tree comprises the node of the URL family pattern tree, comparing the sub-level node of the sub-pattern tree under the node with the node of the URL family until the sub-pattern tree is merged into the URL family pattern tree;
if the node of the URL family pattern tree comprises the node of the sub-pattern tree, comparing the sub-level node of the URL family pattern tree with the node of the sub-pattern tree until the sub-pattern tree is merged into the URL family pattern tree.
13. The apparatus of claim 12, comprising:
the extraction module is specifically configured to:
traversing the pattern tree of the URL family layer by layer from deep to shallow from the URL to be detected;
and when the mode tree has mode nodes with the number of the URLs without abstract processing exceeding a preset threshold, the mode nodes are the representative modes of the URLs to be detected.
14. The apparatus of any of claims 8 to 13, comprising:
the processing module is further configured to:
if the existing URL families do not have the URL family with the same native pattern as the URL to be detected, storing the native pattern of the URL to be detected as a newly added URL family;
and detecting the URL to be detected.
CN201710108755.7A 2017-02-27 2017-02-27 URL detection method and detection device Active CN106940711B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710108755.7A CN106940711B (en) 2017-02-27 2017-02-27 URL detection method and detection device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710108755.7A CN106940711B (en) 2017-02-27 2017-02-27 URL detection method and detection device

Publications (2)

Publication Number Publication Date
CN106940711A CN106940711A (en) 2017-07-11
CN106940711B true CN106940711B (en) 2020-02-07

Family

ID=59469693

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710108755.7A Active CN106940711B (en) 2017-02-27 2017-02-27 URL detection method and detection device

Country Status (1)

Country Link
CN (1) CN106940711B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898046B (en) * 2020-07-16 2024-02-13 北京天空卫士网络安全技术有限公司 Method and device for redirection management
CN111935133A (en) * 2020-08-06 2020-11-13 北京顶象技术有限公司 White list generation method and device
CN114650152B (en) * 2020-12-17 2023-06-20 中国科学院计算机网络信息中心 Super computing center vulnerability detection method and system
CN113839940B (en) * 2021-09-18 2023-06-06 北京知道创宇信息技术股份有限公司 URL pattern tree-based defense method, device, electronic equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727447A (en) * 2008-10-10 2010-06-09 浙江搜富网络技术有限公司 Generation method and device of regular expression based on URL
CN102222187A (en) * 2011-06-02 2011-10-19 国家计算机病毒应急处理中心 Domain name structural feature-based hang horse web page detection method
CN102739679A (en) * 2012-06-29 2012-10-17 东南大学 URL(Uniform Resource Locator) classification-based phishing website detection method
CN104699851A (en) * 2015-04-08 2015-06-10 上海理想信息产业(集团)有限公司 Service tag extension method in big data environment
CN105912573A (en) * 2016-03-30 2016-08-31 北京网康科技有限公司 Data updating method and data updating device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727447A (en) * 2008-10-10 2010-06-09 浙江搜富网络技术有限公司 Generation method and device of regular expression based on URL
CN102222187A (en) * 2011-06-02 2011-10-19 国家计算机病毒应急处理中心 Domain name structural feature-based hang horse web page detection method
CN102739679A (en) * 2012-06-29 2012-10-17 东南大学 URL(Uniform Resource Locator) classification-based phishing website detection method
CN104699851A (en) * 2015-04-08 2015-06-10 上海理想信息产业(集团)有限公司 Service tag extension method in big data environment
CN105912573A (en) * 2016-03-30 2016-08-31 北京网康科技有限公司 Data updating method and data updating device

Also Published As

Publication number Publication date
CN106940711A (en) 2017-07-11

Similar Documents

Publication Publication Date Title
CN106940711B (en) URL detection method and detection device
CN107992481B (en) Regular expression matching method, device and system based on multi-way tree
US20150207704A1 (en) Public opinion information display system and method
CN104036187B (en) Method and system for determining computer virus types
CN105335246B (en) A kind of program crashing defect self-repairing method based on question and answer web analytics
CN105095091B (en) A kind of software defect code file localization method based on Inverted Index Technique
CN113254751B (en) Method, equipment and storage medium for accurately extracting complex webpage structured information
CN113901474B (en) Vulnerability detection method based on function-level code similarity
CN109561163B (en) Method and device for generating uniform resource locator rewriting rule
JP2010231560A (en) Map data error correction device
CN115033895B (en) Binary program supply chain safety detection method and device
CN112445997A (en) Method and device for extracting CMS multi-version identification feature rule
CN112989348A (en) Attack detection method, model training method, device, server and storage medium
CN104765882A (en) Internet website statistics method based on web page characteristic strings
CN109815337B (en) Method and device for determining article categories
CN109471934B (en) Financial risk clue mining method based on Internet
EP2354971A1 (en) Document analysis system
CN107590233B (en) File management method and device
CN111475464B (en) Method for automatically finding and mining fingerprints of Web component
CN113761137B (en) Method and device for extracting address information
CN113806647A (en) Method for identifying development framework and related equipment
CN103870590B (en) Webpage identification method and device with error-reported characteristic
CN109064067B (en) Financial risk operation subject determination method and device based on Internet
CN109614535B (en) Method and device for acquiring network data based on Scapy framework
CN110472125B (en) Multistage page cascading crawling method and equipment based on web crawler

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 100089 Beijing city Haidian District Road No. 4 North wa Yitai three storey building

Patentee after: NSFOCUS Technologies Group Co.,Ltd.

Patentee after: NSFOCUS TECHNOLOGIES Inc.

Address before: 100089 Beijing city Haidian District Road No. 4 North wa Yitai three storey building

Patentee before: NSFOCUS INFORMATION TECHNOLOGY Co.,Ltd.

Patentee before: NSFOCUS TECHNOLOGIES Inc.

CP01 Change in the name or title of a patent holder