CN106940711A - A kind of URL detection methods and detection means - Google Patents

A kind of URL detection methods and detection means Download PDF

Info

Publication number
CN106940711A
CN106940711A CN201710108755.7A CN201710108755A CN106940711A CN 106940711 A CN106940711 A CN 106940711A CN 201710108755 A CN201710108755 A CN 201710108755A CN 106940711 A CN106940711 A CN 106940711A
Authority
CN
China
Prior art keywords
url
detected
tree
races
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710108755.7A
Other languages
Chinese (zh)
Other versions
CN106940711B (en
Inventor
张龙
李志强
王晓琪
刘敏
高学龄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nsfocus Technologies Inc
Nsfocus Technologies Group Co Ltd
Original Assignee
NSFOCUS Information Technology Co Ltd
Beijing NSFocus Information Security Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NSFOCUS Information Technology Co Ltd, Beijing NSFocus Information Security Technology Co Ltd filed Critical NSFOCUS Information Technology Co Ltd
Priority to CN201710108755.7A priority Critical patent/CN106940711B/en
Publication of CN106940711A publication Critical patent/CN106940711A/en
Application granted granted Critical
Publication of CN106940711B publication Critical patent/CN106940711B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links

Abstract

The embodiment of the invention discloses a kind of URL detection methods and detection means, for solving the problem of prior art can not carry out accurate parameter extraction, including:Obtain URL to be detected routing information;Abstract processing is carried out to URL to be detected routing information, URL to be detected primary pattern is obtained;According to URL to be detected primary pattern, the URL races belonging to URL to be detected are determined;URL races have and the primary pattern of URL identicals to be detected;URL to be detected is incorporated to URL races;URL to be detected representative pattern is selected from URL races;Parameter in the URL to be detected according to representing schema extraction detected, and using URL to be detected testing result as the pattern that represents testing result.Detection URL need to be only treated according to URL to be detected abstract part and carries out parameter extraction, so as to improve the accuracy rate of parameter extraction, and then the accuracy rate of URL detections is improved.

Description

A kind of URL detection methods and detection means
Technical field
The present invention relates to communication technical field, more particularly to a kind of URL detection methods and detection means.
Background technology
When being scanned progress risk assessment to network application, it is necessary to carry out detection assessment to each possible point.Its In URL (Uniform Resource Locator, URL) be the point that there may be leak, however, magnanimity URL can cause retrieving redundant and complicated, or even can not complete.Typically, the URL for identical type has identical leakage Hole, therefore, goes to Beijing South Maxpower Technology Co. Ltd to help us more efficiently to assess the leak of website presence the URL of identical type, weight is done less exactly The detection of renaturation.In actual applications, meeting extracting parameter and parameter value construction payload data (payload) are detected, The page, catalogue or parameter name can be carried out according to detected rule simultaneously, or even parameter value carries out duplicate removal.
Prior art can only be adapted to the link of traditional canonical form, be recognized by the spcial character in URL, for example To http://www.test.com/admin/easycheck/exerecord/Batch_id=28 and http:// www.test.com/admin/easycheck/exerecord/Batch_id=29 is such to be linked, in URL "” Recognizing inquiry string (query string), the & in query string distinguishes different parameters, with regard to that can reach The two URL parameter is batch_id in the purpose of parameter extraction and duplicate removal, i.e. example, and parameter value is respectively 28 and 29, right In the such detection logic of cross-site scripting attack (Cross Site Script, XSS), it is only necessary to detect any in two URL One.
However, there is many websites to use URL rewritings (URL Rewriting) technology on internet.Such as http:// www.somebloghost.com/Blogs/Posts.phpYear=2006&Month=12&Day=10, rewrites by URL Afterwards, it can become:http://www.somebloghost.com/Blogs/2006/12/10/.Traditional parameter extraction and go The mode of weight is just no longer applicable herein.What is more, such as https://www.oschina.net/news/74686/ URL as chandao-8-2-3, it is that variable actual parameter is more difficult which, which is recognized,.To having used URL rewriting techniques Large-Scale Interconnected web site (than know, Jingdone district) carry out risk assessment when, because website is huge, each URL paths not phase Together, if duplicate removal can not be carried out effectively to URL, scanning can become substantial amounts of repetition, redundancy and inefficiency, even may not be used The work that can be completed;If parameter extraction can not be carried out effectively, scanning will become no target, the extremely low nothing of accuracy rate The work of meaning.
In a word, prior art accurately can not carry out accurate parameter extraction to the URL after rewriting.
The content of the invention
The present invention provides a kind of URL detection methods and detection means, to solve can not be accurate present in prior art Duplicate removal is carried out to the URL after rewriting, so that the problem of detection efficiency is low.
The embodiment of the present invention provides a kind of uniform resource position mark URL detection method, including:
Obtain URL to be detected routing information;URL to be detected is the URL after rewriting is handled;
Abstract processing is carried out to URL to be detected routing information, URL to be detected primary pattern is obtained;
According to URL to be detected primary pattern, the URL races belonging to URL to be detected are determined;URL races have and URL to be detected The primary pattern of identical;
URL to be detected is incorporated to URL races;
URL to be detected representative pattern is selected from URL races;
Parameter according to representing in schema extraction URL to be detected detected, and using URL to be detected testing result as Represent the testing result of pattern.
Optionally, abstract processing is carried out to URL to be detected routing information, obtains URL to be detected primary pattern, bag Include:
Replaced by canonical, the no special character in URL to be detected is converted into letter or number;No special character includes Do not possess the character of compartmentation in URL to be detected;
Character string in URL to be detected after canonical replacement processing in each separator is taken out by setting rule As processing.
Optionally, the character string in the URL to be detected after canonical replacement processing in each separator is advised by setting Abstract processing is then carried out, including:
If character string in separator is alphabetic character string, by alphabetic character string it is abstract be the first mark;
If character string in separator is digit strings, by digit strings it is abstract be the second mark;
If the character string in separator is the character string being made up of numeral and letter, abstract is the 3rd mark.
Optionally, URL to be detected is incorporated in URL races, including:
URL races are the scheme-tree that each URL in URL races is built with successively abstract ways;
N number of history URL is obtained from URL races, N is positive integer;
It is handled as follows for each history URL:
URL to be detected is compared two-by-two with history URL, the difference between URL to be detected and history URL is obtained;
Difference between URL to be detected and history URL is carried out successively abstract so as to construct URL to be detected and the history Subpattern tree between URL;
Subpattern tree is incorporated to URL races.
Optionally, subpattern tree is incorporated to URL races, including:According to order from shallow to deep, successively by subpattern tree node Compared with the node of URL races scheme-trees pair;Subpattern tree node is the URL pattern in subpattern tree, and URL races pattern tree node is URL pattern in URL races scheme-tree;
If without comprising or by inclusion relation between the node of subpattern tree and the node of URL races scheme-tree, directly by son Pattern tree node and URL pattern tree node merge;
If the node of subpattern tree includes the node of URL races scheme-tree, by sub- level section of the subpattern tree under the node Point is compared with the node of URL races, until subpattern tree is incorporated in URL races scheme-tree;
If the node of URL races scheme-tree includes the node of subpattern tree, by the sub- level node and submodule of URL races scheme-tree The node of formula tree compares, until subpattern tree is incorporated in URL races scheme-tree.
Optionally, URL to be detected representative pattern is selected from URL races, including:
Since URL to be detected, by the scheme-tree for being deep to shallow successively traversal URL races;
When the number in scheme-tree in the presence of the URL without abstract processing exceedes the mode node of predetermined threshold value, pattern section Point is URL to be detected representative pattern.
Optionally, including:
If the pattern of representative is tested, using the testing result for the pattern that represents as URL to be detected testing result.
Optionally, also include:
If storing to be checked without the URL races with URL to be detected with identical primary pattern in existing each URL races The primary pattern for surveying URL is used as the URL newly added a race;
URL to be detected is detected.
The embodiment of the present invention provides a kind of uniform resource position mark URL detection means, including:
Acquisition module, the routing information for obtaining URL to be detected;URL to be detected is the URL after rewriting is handled;
Abstract module, carries out abstract processing for the routing information to URL to be detected, obtains URL to be detected primary mould Formula;
Enquiry module, for the primary pattern according to URL to be detected, determines the URL races belonging to URL to be detected;URL races have Have and the primary pattern of URL identicals to be detected;
Merging module, for URL to be detected to be incorporated into URL races;
Extraction module, the representative pattern for selecting URL to be detected from URL races;
Processing module, for being detected according to the parameter represented in schema extraction URL to be detected, and by URL to be detected Testing result as the pattern that represents testing result.
Optionally, abstract module, specifically for:
Replaced by canonical, the no special character in URL to be detected is converted into letter or number;No special character includes Do not possess the character of compartmentation in URL to be detected;
Character string in URL to be detected after canonical replacement processing in each separator is taken out by setting rule As processing.
Optionally, abstract module, specifically for:
If character string in separator is alphabetic character string, by alphabetic character string it is abstract be the first mark;
If character string in separator is digit strings, by digit strings it is abstract be the second mark;
If the character string in separator is the character string being made up of numeral and letter, abstract is the 3rd mark.
Optionally, URL races are the scheme-tree that each URL in URL races is built with successively abstract ways;
Merging module, specifically for:
N number of history URL is obtained from URL races, N is positive integer;
It is handled as follows for each history URL:
URL to be detected is compared two-by-two with history URL, the difference between URL to be detected and history URL is obtained;
Difference between URL to be detected and history URL is carried out successively abstract so as to construct URL to be detected and the history Subpattern tree between URL;
Subpattern tree is incorporated to URL races.
Optionally, merging module specifically for:
According to order from shallow to deep, subpattern tree node is compared pair with the node of URL races scheme-trees successively;Subpattern Tree node is the URL pattern in subpattern tree, and URL races pattern tree node is the URL pattern in URL races scheme-tree;
If without comprising or by inclusion relation between the node of subpattern tree and the node of URL races scheme-tree, directly by son Pattern tree node and URL pattern tree node merge;
If the node of subpattern tree includes the node of URL races scheme-tree, by sub- level section of the subpattern tree under the node Point is compared with the node of URL races, until subpattern tree is incorporated in URL races scheme-tree;
If the node of URL races scheme-tree includes the node of subpattern tree, by the sub- level node and submodule of URL races scheme-tree The node of formula tree compares, until subpattern tree is incorporated in URL races scheme-tree.
Optionally, extraction module, specifically for:
Since URL to be detected, by the scheme-tree for being deep to shallow successively traversal URL races;
When the number in scheme-tree in the presence of the URL without abstract processing exceedes the mode node of predetermined threshold value, pattern section Point is URL to be detected representative pattern.
Optionally, processing module specifically for:
If the pattern of representative is tested, using the testing result for the pattern that represents as URL to be detected testing result.
Optionally, processing module, is additionally operable to:
If storing to be checked without the URL races with URL to be detected with identical primary pattern in existing each URL races The primary pattern for surveying URL is used as the URL newly added a race;
URL to be detected is detected.
In summary, the embodiments of the invention provide a kind of URL detection methods and detection means, including:Obtain to be detected URL routing information;URL to be detected is the URL after rewriting is handled;Abstract place is carried out to URL to be detected routing information Reason, obtains URL to be detected primary pattern;According to URL to be detected primary pattern, the URL races belonging to URL to be detected are determined; URL races have and the primary pattern of URL identicals to be detected;URL to be detected is incorporated to URL races;Selected from URL races to be detected URL representative pattern;Parameter in the URL to be detected according to representing schema extraction detected, and by URL to be detected inspection Survey testing result of the result as the pattern that represents.URL to be detected routing information is subjected to abstract processing and obtains URL's to be detected Primary pattern, primary pattern here is one of URL to be detected more rough feature, is incorporated into and treated according to primary pattern Detect in the URL races described in URL, URL in URL races and URL to be detected has identical more rough feature, in URL In race, specific subdivision is carried out to above-mentioned more rough feature under primary pattern, multiple URL pattern branches have been obtained, wherein Some URL pattern, that is, representing the architectural feature of pattern just can represent this URL to be detected architectural feature, represent in pattern URL to be detected characteristic has been carried out it is abstract, therefore, only need to be right according to URL to be detected abstract part during extracting parameter URL to be detected carries out parameter extraction, so as to improve the accuracy rate of parameter extraction, and then improves the accuracy rate of URL detections.
Brief description of the drawings
Technical scheme in order to illustrate more clearly the embodiments of the present invention, below will be to that will make needed for embodiment description Accompanying drawing is briefly introduced, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for this For the those of ordinary skill in field, without having to pay creative labor, it can also be obtained according to these accompanying drawings His accompanying drawing.
Fig. 1 is a kind of URL detection methods schematic flow sheet provided in an embodiment of the present invention;
Fig. 2 merges one of schematic diagram for a kind of scheme-tree provided in an embodiment of the present invention;
Fig. 3 merges the two of schematic diagram for a kind of scheme-tree provided in an embodiment of the present invention;
A kind of Fig. 4 structure of the detecting device schematic diagrames provided in an embodiment of the present invention.
Embodiment
In order that the object, technical solutions and advantages of the present invention are clearer, below in conjunction with accompanying drawing the present invention is made into One step it is described in detail, it is clear that described embodiment is only embodiment of the invention a part of, rather than whole implementation Example.Based on the embodiment in the present invention, what those of ordinary skill in the art were obtained under the premise of creative work is not made All other embodiment, belongs to the scope of protection of the invention.
Fig. 1 is a kind of URL detection methods schematic flow sheet provided in an embodiment of the present invention, as shown in figure 1, including following step Suddenly:
S101:Obtain URL to be detected routing information;URL to be detected is the URL after rewriting is handled;
S102:Abstract processing is carried out to URL to be detected routing information, URL to be detected primary pattern is obtained;
S103:According to URL to be detected primary pattern, the URL races belonging to URL to be detected are determined;URL races have with it is to be checked Survey the primary pattern of URL identicals;
S104:URL to be detected is incorporated to URL races;
S105:URL to be detected representative pattern is selected from URL races;
S106:Parameter in the URL to be detected according to representing schema extraction detected, and by URL to be detected inspection Survey testing result of the result as the pattern that represents.
In specific implementation process, the URL to be detected in the embodiment of the present invention is the URL after URL rewritings processing.This hair URL to be detected in bright embodiment is any one in the magnanimity URL that need to be detected, after this URL to be detected detection is completed, also Another URL not detected can be extracted from magnanimity URL to repeat the above steps as URL to be detected.
URL after rewriting is handled can be analyzed to protocol name (schema), address field (host), path port (port), path (path).It is pointed out that the premise of the embodiment of the present invention is each URL have identical schema, Host, port information, the URL will with identical schema, host, port information gather as a processing, for place URL in processing set is detected using the detection method disclosed in the embodiment of the present invention.Therefore, carried out to URL Before detection, need first to carry out diversity, typically, the URL meetings of same source to URL according to URL schema, host, port information With identical schema, host, port information, accordingly it is also possible to diversity is directly carried out to URL according to URL source, it Afterwards, detected for each URL diversity using the method disclosed in the embodiment of the present invention.
In S101 specific implementation process, URL to be detected routing information contains URL to be detected parameter and parameter Value information, therefore, routing information are the main objects of embodiment of the present invention processing.Extraction to routing information can be according to existing Path identification method extracted.
In S102 specific implementation process, abstract processing is carried out to URL to be detected routing information, i.e., to be detected Particular content progress in URL routing information is abstract, preferably, the letter and number and combinations thereof in path are adopted Distinguished with different marks, that is, taken out the composition architectural feature of routing information.
In S103 specific implementation process, the URL with identical primary pattern being tested constitutes these URL races belonging to URL.Because the URL being tested has multiple, they can have different primary patterns, therefore also correspond to deposit In multiple URL races.After the primary pattern for obtaining URL to be detected, travel through each URL race, if a certain URL races have with it is to be detected The primary pattern of URL identicals, then the URL races be changed into the URL races belonging to URL to be detected.If preferably, in existing each URL races Without the URL races with URL to be detected with identical primary pattern, then the primary pattern of the URL to be detected is stored as one The individual URL races newly added;URL to be detected is detected.
In S104 specific implementation process, URL race of the URL races belonging to URL to be detected herein, the two has identical Primary pattern, so URL to be detected abstract can be necessarily incorporated in affiliated URL races by a certain degree of.Preferably, from History URL is extracted in URL races and is compared with URL to be detected, it is abstract layer by layer according to local difference therebetween, until finding Identical pattern, afterwards, the multiple middle models generated by URL to be detected and layer by layer in abstraction process are together incorporated to URL races In.
In S105 specific implementation process, multiple URL and URL moulds are included in the URL races after being handled by S104 Formula, URL to be detected representative pattern, which need to be incorporated to from S104 in the URL pattern of URL races, to be chosen.Alternatively, for representing pattern Selection, can determine that the depth of URL pattern herein refers to according to the number of leaf node under the depth and URL pattern of URL pattern Be successively to be had as to the number of plies of the URL pattern during URL to be detected, is generated, tool is higher as the number of plies by primary pattern, URL pattern depth is deeper;The number of leaf node refers to the number of specific URL under the URL pattern under URL pattern.
In S106 specific implementation process, this URL to be detected structure spy can be represented by representing the architectural feature of pattern Levy, is represented in pattern URL to be detected characteristic has been carried out it is abstract, therefore, only need to be according to be detected during extracting parameter URL abstract part treats detection URL and carries out parameter extraction, so as to improve the accuracy rate of parameter extraction, and then improves URL The accuracy rate of detection, if for example, URL to be detected path is/10882/, it is/1088d/, the then parameter extracted that it, which represents pattern, For 2, if it is/108dd/ that it, which represents pattern, the parameter extracted is 82.
If being tested preferably, representing pattern, using the testing result for the pattern that represents as URL to be detected detection knot Really.The testing result for representing pattern represents the testing result of all leaf node URL under the URL pattern, represents the inspection of pattern Surveying result is determined by the testing result of some URL in these leaf nodes URL.Optionally, do not have if this represents pattern There is testing result, illustrate that this represents the leaf node URL under pattern and does not have what is be tested also, now, URL to be detected is carried out Detection, and using URL to be detected testing result as the URL pattern testing result, so as to realize the duplicate removal to URL, Improve URL detection efficiencies.
URL to be detected routing information is subjected to the primary pattern that abstract processing obtains URL to be detected, primary mould here During formula is one of URL to be detected more rough feature, the URL races according to primary pattern is incorporated into URL to be detected, URL in URL races and URL to be detected has identical more rough feature, in URL races, to above-mentioned under primary pattern More rough feature has carried out specific subdivision, obtains multiple URL pattern branches, wherein some URL pattern, that is, represents mould The testing result of formula just can represent this URL to be detected testing result, detected, and be just not required in advance if representing pattern This URL to be detected is detected again, so as to simplify URL detection process, accelerates detection efficiency.In addition, the above method can be real-time Carry out, be not required to first set up detection model, therefore detection process is easier.
The embodiment of the present invention provides a kind of implementation method of feasible acquisition URL to be detected primary pattern, including:Pass through Canonical is replaced, and the no special character in URL to be detected is converted into letter or number;No special character is included in URL to be detected Do not possess the character of compartmentation;Character string in URL to be detected after canonical replacement processing in each separator is pressed Setting rule carries out abstract processing.The existing letter of constituent of URL routing informations, numeral, there is character again, in these characters, Partial character has compartmentation, i.e. spcial character, and remaining character is the information composition of URL routing informations.Preferably.Will be to be checked The no special character surveyed in URL is converted to letter or number, only retains separating character, URL architectural feature can be made more bright It is aobvious, so as to reduce the identification difficulty to URL routing informations, and then it is easier abstract to URL progress.
Table one
Canonical Replace
' [%+!\[\];&]' '0v0'
'(<=w) (![a-zA-Z]+$)' 'v'
'(<=[a-zA-Z] -) (w+-)+(=d+ $) ' 'v'
'(<=[a-zA-Z]) (d* [- _] d*)+(=[a-zA-Z]) ' '0'
Table one is a kind of canonical Substitution Rules provided in an embodiment of the present invention, as shown in Table 1, part in above-mentioned character string Character is substituted by 0 or letter v of numeral, and partial character is by numeral 0 and letter v replacements, after this is due to this part no special character Extended meeting is conceptualized as the 3rd mark, same with numeral and the mashed up character string of letter, so it is for convenience to do replacement with 0 and v Follow-up is uniformly processed.It is pointed out that 0 and v is that a kind of numeral and letter are illustrated herein, in actual use, Can be 1 and u, any letter and number such as 2 and b.
After canonical replacement processing, letter, numeral and separator, root are only included in URL routing informations to be detected URL routing informations to be detected are divided into multiple character strings according to separator, abstract processing is carried out to each character string.
More specifically, the embodiment of the present invention is provided each divides in a kind of URL to be detected by after canonical replacement processing The method for carrying out abstract processing by setting rule every the character string in symbol, including:
If character string in separator is alphabetic character string, by alphabetic character string it is abstract be the first mark;If separator Interior character string be digit strings, then by digit strings it is abstract be the second mark;If the character string in separator is by counting The character string of word and letter composition, then abstract is the 3rd mark.Alphabetic character string refers to the character string being made up of alphabetic character, Similarly, digit strings are the character string being made up of numerical character, and the character string that numeral and letter are constituted refers to constituting character Existing numerical character has alphabetic character again in multiple characters of string.It is abstract using different identification progress to the character of different compositions, For example, using " c " represent alphabetic character, using " d " represent numerical character, using " w " represent numerical character or alphabetic word Symbol, further, the same class character of continuous multiple appearance is represented using "+", by "+" respectively with " c " " d " " w " combine Respectively constituted first mark, second mark and the 3rd mark, for example, character string " abcderf " just can it is abstract for " c+ ". Each character string in URL routing informations to be detected is carried out abstract, just can finally obtain URL to be detected primary mould Formula.Table two is a kind of primary mode example provided in an embodiment of the present invention, as shown in Table 2, four different URL's to be detected Routing information is conceptualized as two kinds of primary patterns ,/reach296/p/3816387.html and/reach296/p/ respectively 4001918.html is conceptualized as/w+/c+/d+. c+ ,/reach296/wlwmanifest.xml and/reach296/ Default.html is conceptualized as/w+/c+. c+.Generally, the URL with identical primary pattern can have identical Logic is handled, therefore can be considered in same family, such as table two ,/reach296/p/3816387.html and/reach296/p/ 4001918.html just can be considered as same family ,/reach296/wlwmanifest.xml and/reach296/ Default.html can be considered as another race.
Table two
Obtain after URL to be detected primary pattern, it is necessary to be incorporated to URL to be detected according to URL to be detected primary pattern In URL races belonging to it, the scheme-tree that URL races here are built for each URL in URL races with successively abstract ways.The present invention Embodiment provides a kind of method being incorporated to URL to be detected in URL races, including:N number of history URL is obtained from URL races, N is just Integer;It is handled as follows for each history URL:URL to be detected is compared two-by-two with history URL, obtained to be detected Difference between URL and history URL;Difference between URL to be detected and history URL is carried out successively abstract so as to construct Subpattern tree between URL to be detected and history URL;Subpattern tree is incorporated to the URL races.Preferably, being carried from URL races The N number of history URL taken is belonging respectively to different most deep URL patterns, and most deep URL pattern here is in URL tree in each branch Depth most deep URL pattern.Alternatively it is also possible to extract newest top n history URL processing from URL races.In theory Situation the most perfect is to take history URL and URL to be detected described in URL races to be contrasted two-by-two, but for large-scale station It is excessively complicated for point, whether extracted, extracted according further to URL treatment times, the N number of history extracted according to URL pattern URL need to be representative as far as possible, so as to lift processing speed while ensureing detection accuracy.
For the URL processing of each history, in specific implementation process, need to URL to be detected and history URL two-by-two it Between carry out charactor comparison, to difference carry out multilayer it is abstract.Table three is a kind of comparative result mark side provided in an embodiment of the present invention Method example, with the different comparative result of different character representations, as shown in Table 3, fragment 1 and fragment 2 represent to be detected respectively Character string fragment in URL and history URL routing information, if character only exists in fragment 1, this character is identified with '-', If character only exists in fragment 2, this character is identified with '+', if character all exists in fragment 1 and fragment 2, is marked with ' ' Know.
Table three
Character Implication
′-′ Only exist in fragment 1
'+' Only exist in fragment 2
′′ All exist in fragment 1 and 2
After the preliminary comparison's result for obtaining URL to be detected and history URL, difference therebetween is carried out to take out for the first time As, this time abstract purpose be the letter of difference section with " c " represent, numeral with " d " represent.Optionally, using canonical Replace and complete above-mentioned processing, table four is a kind of canonical Substitution Rules provided in an embodiment of the present invention.
Table four
Canonical Replace
1 ' [a-zA-Z+] | % [0-9A-Z] { 2 } ' '\c'
1 '\d' '\d'
2 '(<!\\)[a-zA-Z]' '\c'
2 '\d' '\d'
3 '(\\d)+' '\d+'
3 '(\\c)+' '\c+'
4 '[^/]+' '\w+'
After abstract for the first time, also need to proceed it is abstract, until URL to be detected and history URL be conceptualized as it is same URL pattern, now, URL to be detected and history URL and the repeatedly URL pattern of abstract generation together constitute URL to be detected and History URL subpattern tree.Because history URL and URL to be detected comes from same URL races, so the two final sure is obtained Obtain identical URL pattern, that is to say, that sure generation has the subpattern tree of identical root node.With URL paths/cate/ Exemplified by 108705/ and/cate/108709/, difference between them can using preliminary identification as:/,c,a,t,e,/,1,0,8,7, 0, -5 ,+9 ,/, it is seen that only 5 is different with 9 two numerals, and above-mentioned two URL is carried out into abstract, acquisition/cate/ for the first time 10870 d/, above-mentioned two URL for the first time it is abstract just obtains identical URL pattern, now, subpattern tree is:
Again for example, exemplified by URL paths/pick/1/ and/cate/12/, the subpattern tree obtained after the two comparison is abstract is:
For each history URL, after URL to be detected and the history URL subpattern tree is obtained, by subpattern tree simultaneously In the scheme-tree for entering URL races.Each node in sub- scheme-tree and URL races scheme-tree is compared preferably, being replaced using canonical, Table five replaces comparison rule for a kind of canonical provided in an embodiment of the present invention, as shown in Table 5, and A nodes are one in subpattern tree Individual URL pattern, B node is a URL pattern in URL races scheme-tree, when the expression formula of A nodes has completely included B node During expression formula, its comparing result is 1, and such as expression formula of A bytes is '/cate/ d+/', and the expression formula of B byte is '/cate/ 1087 d d/', " d+ " represent with long number character string, " 1087 d d " represent before 4-digit number be fixed as 1087, Two character strings for numeral afterwards, it is clear that '/cate/ d+/' contain '/cate/1087 d d/', therefore, comparing result is 1, accordingly, when the expression formula of B node has completely included the expression formula of A nodes, comparing result is -1, when A node expressions B node expression formula can neither be included, when B node expression formula can not include A node expressions, its comparing result is 0.
Table five
More specifically, when comparing A node expressions and B node expression formula, A node expressions can be translated as just Then expression formula, is translated as B node expression formula to translate the regular expression obtained with A node expressions after general character string and Row matching.Wherein, when being translated to B node expression formula, possible parameter need to be considered, in order to avoid fortuitous phenomena, preferably , an expression formula is translated as at least two character strings, when being all matched just calculate the match is successful, for example, B node express Formula/cat/ d/ can just be translated into/cat/0/ and/cat/9/, both translate the canonical table of coming by A node expressions Matched up to formula, just calculate A nodes and contain B node.Table six is a kind of URL pattern expression formula translation provided in an embodiment of the present invention Sample Rules.
Table six
More specifically, the embodiment of the present invention provides a kind of scheme-tree merging mode, including:
According to order from shallow to deep, subpattern tree node is compared pair with the node of URL races scheme-trees successively;Subpattern Tree node is the URL pattern in subpattern tree, and URL races pattern tree node is the URL pattern in URL races scheme-tree;If subpattern Without comprising or by inclusion relation between the node of tree and the node of URL races scheme-tree, then directly by subpattern tree node and URL moulds Formula tree node merges;If the node of subpattern tree includes the node of URL races scheme-tree, by son of the subpattern tree under the node Level node is compared with the node of URL races, until subpattern tree is incorporated in URL races scheme-tree;If the node of URL races scheme-tree The node of subpattern tree is included, then the sub- level node of URL races scheme-tree is compared with the node of subpattern tree, until by submodule Formula tree is incorporated in URL races scheme-tree.
Merging for subpattern tree and URL races scheme-tree is a recursive operation in fact, there is following three kinds of possibility Situation:
Situation 1:Two scheme-trees compare since root node, root node comparative result be 0 when, directly by subpattern tree with URL races scheme-tree merges.Fig. 2 merges one of schematic diagram for a kind of scheme-tree provided in an embodiment of the present invention, as shown in Fig. 2 two The root node of individual scheme-tree is directly merged.
Situation 2:When root node comparative result is -1, the sub- level scheme-tree in URL races scheme-tree is waited to compare as new Compared with scheme-tree, be compared, repeat the above steps again with subpattern tree, until subpattern tree is incorporated into URL races scheme-tree In, Fig. 3 merges the two of schematic diagram for a kind of scheme-tree provided in an embodiment of the present invention, as shown in figure 3, the root node of subpattern tree Finally become a URL pattern node in URL races scheme-tree, subpattern tree as URL races pattern a sub- level pattern Tree is incorporated with URL races scheme-tree.
Situation 3:When root node comparative result is 1, the relativity of subpattern tree and URL races scheme-tree is exchanged into repetition Sub- level scheme-tree in step in situation 2, i.e. bundle scheme-tree is as new scheme-tree to be compared, with URL races scheme-tree It is compared, repeats the above steps again, until subpattern tree mutually merges with URL races scheme-trees.
The embodiment of the present invention provides an instantiation for merging subpattern tree with URL races scheme-trees, wherein, subpattern Set and be:
After the two merges, the new URL races scheme-tree of acquisition is:
URL to be detected and N number of history URL N number of subpattern tree for being compared acquisition two-by-two are fully incorporated URL races scheme-tree Afterwards, URL to be detected representative pattern is extracted from URL races pattern.Specifically, the embodiment of the present invention provides a kind of URL to be detected Representative pattern determination method, including:Since URL to be detected, by the scheme-tree for being deep to shallow successively traversal URL races;Work as mould When number in formula tree in the presence of the URL without abstract processing exceedes the mode node of predetermined threshold value, this mode node is to be detected URL representative pattern.The new URL races mould that the subpattern tree provided with above-described embodiment obtains after merging with URL races scheme-trees The embodiment of the present invention is illustrated exemplified by formula tree, it is noted that above-mentioned new URL races scheme-tree is not practical application mistake The URL races scheme-tree for the pattern that represents is extracted in journey, URL races scheme-tree in actual use is to incorporate N number of subpattern tree URL races scheme-tree afterwards.Assuming that URL to be detected is /cate/108709/, it is in above-mentioned new URL races scheme-tree, correspondence Each URL pattern arranged from deep to shallow by level, and the number of the URL without abstract processing under each URL pattern is:
It can be seen that, a URL to be detected has the URL pattern of different levels, preferably, representing pattern to possess model identical The number of URL without abstract processing reaches the depth of predetermined threshold value most deep pattern.For example, when predetermined threshold value is 2, representing Pattern for/cate/10870 d/, and for example, when predetermined threshold value is 3, represent pattern as/c+/d+/.Optionally, predetermined threshold value Can by rule of thumb or actual conditions setting, can also be obtained according to logical calculated.Preferably, the embodiment of the present invention provides a kind of default Threshold value T computational methods, as shown in formula one:
T=max (3, URL numbers/max (race's scheme-tree depth, 1)) (formula one),
As shown in formula one, T minimums cannot be below 3.
When determine represent pattern after, need to determine whether to represent whether pattern has been tested.If representing under pattern History URL in have what is be tested, then it is assumed that this represents pattern and has been tested.When contemporary table schema has been tested, just Using the testing result for the pattern that represents as URL to be detected testing result.If the pattern of representative was not detected among, to be detected URL is detected.Table seven is a kind of URL detection examples to be detected provided in an embodiment of the present invention, as shown in Table 7, for be checked URL/cate/108709/ is surveyed, when it represents pattern for/cate/10870d, the parameter of extraction is 9, and is built according to parameter 9 Payload carries out URL detections;When its represent pattern as/cate/ d d d d d d when, the parameter of extraction is 108709, and root Payload, which is built, according to parameter 108709 carries out URL detections;Contemporary table schema for/c c c c/ d d d d d d when, extract Parameter be cate and 108709, and according to parameter cate and 108709 build payload carry out URL detections.Further, URL to be detected is completed after detection, using URL to be detected testing result as the pattern that represents testing result.
Table seven
In summary, the embodiments of the invention provide a kind of URL detection methods, including:Obtain URL to be detected path letter Breath;URL to be detected is the URL after rewriting is handled;Abstract processing is carried out to URL to be detected routing information, obtained to be checked Survey URL primary pattern;According to URL to be detected primary pattern, the URL races belonging to URL to be detected are determined;URL races have with The primary pattern of URL identicals to be detected;URL to be detected is incorporated to URL races;URL to be detected representative mould is selected from URL races Formula;Parameter in the URL to be detected according to representing schema extraction detected, and using URL to be detected testing result as Represent the testing result of pattern.URL to be detected routing information is subjected to the primary pattern that abstract processing obtains URL to be detected, Here primary pattern is one of URL to be detected more rough feature, and URL institutes to be detected are incorporated into according to primary pattern In the URL races stated, URL in URL races and URL to be detected has identical more rough feature, primary in URL races Specific subdivision has been carried out to above-mentioned more rough feature under pattern, multiple URL pattern branches, wherein some URL have been obtained Pattern, that is, representing the architectural feature of pattern just can represent this URL to be detected architectural feature, represent in pattern to be detected URL characteristic carried out it is abstract, therefore, during extracting parameter, only detection need to be treated according to URL to be detected abstract part URL carries out parameter extraction, so as to improve the accuracy rate of parameter extraction, and then improves the accuracy rate of URL detections.
Based on identical technical concept, the embodiment of the present invention also provides a kind of detection means, on the detection means is executable State embodiment of the method.Fig. 4 is a kind of structure of the detecting device schematic diagram provided in an embodiment of the present invention, as shown in figure 4, detection means 400 include:
Acquisition module 401, the routing information for obtaining URL to be detected;URL to be detected is after rewriting is handled URL;
Abstract module 402, carries out abstract processing for the routing information to URL to be detected, obtains the primary of URL to be detected Pattern;
Enquiry module 403, for the primary pattern according to URL to be detected, determines the URL races belonging to URL to be detected;URL Race has and the primary pattern of URL identicals to be detected;
Merging module 404, for URL to be detected to be incorporated into URL races;
Extraction module 405, the representative pattern for selecting URL to be detected from URL races;
Processing module 406, for being detected according to the parameter represented in schema extraction URL to be detected, and will be to be detected URL testing result as the pattern that represents testing result.
Optionally, abstract module 402, specifically for:
Replaced by canonical, the no special character in URL to be detected is converted into letter or number;No special character includes Do not possess the character of compartmentation in URL to be detected;
Character string in URL to be detected after canonical replacement processing in each separator is taken out by setting rule As processing.
Optionally, abstract module 402, specifically for:
If character string in separator is alphabetic character string, by alphabetic character string it is abstract be the first mark;
If character string in separator is digit strings, by digit strings it is abstract be the second mark;
If the character string in separator is the character string being made up of numeral and letter, abstract is the 3rd mark.
Optionally, URL races are the scheme-tree that each URL in URL races is built with successively abstract ways;
Merging module 404, specifically for:
N number of history URL is obtained from URL races, N is positive integer;
It is handled as follows for each history URL:
URL to be detected is compared two-by-two with history URL, the difference between URL to be detected and history URL is obtained;
Difference between URL to be detected and history URL is carried out successively abstract so as to construct URL to be detected and the history Subpattern tree between URL;
Subpattern tree is incorporated to URL races.
Optionally, extraction module 405, specifically for:
Since URL to be detected, by the scheme-tree for being deep to shallow successively traversal URL races;
When the number in scheme-tree in the presence of the URL without abstract processing exceedes the mode node of predetermined threshold value, pattern section Point is URL to be detected representative pattern.
Optionally, processing module 406 is additionally operable to:
If the pattern of representative is tested, using the testing result for the pattern that represents as URL to be detected testing result.
Optionally, processing module 406, are additionally operable to:
If storing to be checked without the URL races with URL to be detected with identical primary pattern in existing each URL races The primary pattern for surveying URL is used as the URL newly added a race;
URL to be detected is detected.
In summary, the embodiments of the invention provide a kind of uniform resource position mark URL detection method and detection means, bag Include:Obtain URL to be detected routing information;URL to be detected is the URL after rewriting is handled;To URL to be detected path letter Breath carries out abstract processing, obtains URL to be detected primary pattern;According to URL to be detected primary pattern, URL to be detected is determined Affiliated URL races;URL races have and the primary pattern of URL identicals to be detected;URL to be detected is incorporated to URL races;From URL races Select URL to be detected representative pattern;Parameter in the URL to be detected according to representing schema extraction is detected, and will be treated Detect that URL testing result is used as the testing result for the pattern that represents.URL to be detected routing information is carried out into abstract processing to obtain URL to be detected primary pattern, primary pattern here is one of URL to be detected more rough feature, according to primary mould Formula is incorporated into the URL races described in URL to be detected, and the URL in URL races and URL to be detected has identical more rough Feature, in URL race, specific subdivision has been carried out to above-mentioned more rough feature under primary pattern, multiple URL moulds have been obtained Formula branch, wherein some URL pattern, that is, representing the architectural feature of pattern just can represent this URL to be detected architectural feature, Is represented in pattern URL to be detected characteristic has been carried out it is abstract, therefore, only need to be according to URL's to be detected during extracting parameter Abstract part treats detection URL and carries out parameter extraction, so as to improve the accuracy rate of parameter extraction, and then improves URL detections Accuracy rate.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product Figure and/or block diagram are described.It should be understood that every one stream in flow chart and/or block diagram can be realized by computer program instructions Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which is produced, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, thus in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
, but those skilled in the art once know basic creation although preferred embodiments of the present invention have been described Property concept, then can make other change and modification to these embodiments.So, appended claims are intended to be construed to include excellent Select embodiment and fall into having altered and changing for the scope of the invention.
Obviously, those skilled in the art can carry out the essence of various changes and modification without departing from the present invention to the present invention God and scope.So, if these modifications and modification of the present invention belong to the scope of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to comprising including these changes and modification.

Claims (16)

1. a kind of uniform resource position mark URL detection method, it is characterised in that including:
Obtain URL to be detected routing information;The URL to be detected is the URL after rewriting is handled;
Abstract processing is carried out to the routing information of the URL to be detected, the primary pattern of the URL to be detected is obtained;
According to the primary pattern of the URL to be detected, the URL races belonging to the URL to be detected are determined;The URL races have with The primary pattern of URL identicals to be detected;
The URL to be detected is incorporated to the URL races;
The representative pattern of the URL to be detected is selected from the URL races;
According to it is described represent schema extraction described in parameter in URL to be detected detected, and by the detection of the URL to be detected As a result as the testing result for representing pattern.
2. the method as described in claim 1, it is characterised in that abstract processing is carried out to the routing information of the URL to be detected, The primary pattern of the URL to be detected is obtained, including:
Replaced by canonical, the no special character in the URL to be detected is converted into letter or number;The no special character Including not possessing the character of compartmentation in the URL to be detected;
Character string in the URL to be detected after canonical replacement processing in each separator is taken out by setting rule As processing.
3. method as claimed in claim 2, it is characterised in that by the URL to be detected after canonical replacement processing Character string in each separator carries out abstract processing by setting rule, including:
If character string in separator is alphabetic character string, by the alphabetic character string it is abstract be the first mark;
If character string in separator is digit strings, by the digit strings it is abstract be the second mark;
If the character string in separator is the character string being made up of numeral and letter, abstract is the 3rd mark.
4. the method as described in claim 1, it is characterised in that the URL to be detected is incorporated in the URL races, including:
The URL races are the scheme-tree that each URL in the URL races is built with successively abstract ways;
N number of history URL is obtained from the URL races, N is positive integer;
It is handled as follows for each history URL:
The URL to be detected is compared two-by-two with history URL, the difference between the URL to be detected and history URL is obtained It is different;
Difference between the URL to be detected and history URL is carried out successively abstract so as to construct the URL to be detected with being somebody's turn to do Subpattern tree between history URL;
The subpattern tree is incorporated to the URL races.
5. method as claimed in claim 4, it is characterised in that the subpattern tree is incorporated to the URL races, including:
According to order from shallow to deep, the subpattern tree node is compared pair with the node of URL races scheme-trees successively;Institute It is the URL pattern in the subpattern tree to state subpattern tree node, and the URL races pattern tree node is URL races scheme-tree In URL pattern;
If without comprising or by inclusion relation between the node of the subpattern tree and the node of URL races scheme-tree, directly The subpattern tree node and the URL pattern tree node are merged;
If the node of the subpattern tree includes the node of URL races scheme-tree, by the subpattern tree under the node Sub- level node compared with the node of the URL races, until the subpattern tree is incorporated in URL races scheme-tree;
If the node of URL races scheme-tree includes the node of the subpattern tree, by the sub- level section of URL races scheme-tree Point is compared with the node of the subpattern tree, until the subpattern tree is incorporated in URL races scheme-tree.
6. method as claimed in claim 4, it is characterised in that the representative mould of the URL to be detected is selected from the URL races Formula, including:
Since the URL to be detected, by being deep to the shallow scheme-tree for successively traveling through the URL races;
When the number in the scheme-tree in the presence of the URL without abstract processing exceedes the mode node of predetermined threshold value, the mould Formula node is the representative pattern of the URL to be detected.
7. the method as described in any one of claim 1 to 6, it is characterised in that also include:
If the pattern that represents is tested, using the testing result for representing pattern as the URL to be detected detection As a result.
8. the method as described in any one of claim 1 to 6, it is characterised in that also include:
If without the URL races with the URL to be detected with identical primary pattern in existing each URL races, storage is described URL to be detected primary pattern is used as the URL newly added a race;
The URL to be detected is detected.
9. a kind of uniform resource position mark URL detection means, it is characterised in that including:
Acquisition module, the routing information for obtaining URL to be detected;The URL to be detected is the URL after rewriting is handled;
Abstract module, carries out abstract processing for the routing information to the URL to be detected, obtains the original of the URL to be detected Raw pattern;
Enquiry module, for the primary pattern according to the URL to be detected, determines the URL races belonging to the URL to be detected;Institute Stating URL races has and the primary pattern of URL identicals to be detected;
Merging module, for the URL to be detected to be incorporated into the URL races;
Extraction module, the representative pattern for selecting the URL to be detected from the URL races;
Processing module, for according to it is described represent schema extraction described in parameter in URL to be detected detected, and treated described Detection URL testing result is used as the testing result for representing pattern.
10. device as claimed in claim 9, it is characterised in that including:
The abstract module, specifically for:
Replaced by canonical, the no special character in the URL to be detected is converted into letter or number;The no special character Including not possessing the character of compartmentation in the URL to be detected;
Character string in the URL to be detected after canonical replacement processing in each separator is taken out by setting rule As processing.
11. device as claimed in claim 10, it is characterised in that including:
The abstract module, specifically for:
If character string in separator is alphabetic character string, by the alphabetic character string it is abstract be the first mark;
If character string in separator is digit strings, by the digit strings it is abstract be the second mark;
If the character string in separator is the character string being made up of numeral and letter, abstract is the 3rd mark.
12. device as claimed in claim 9, it is characterised in that including:
The URL races are the scheme-tree that each URL in the URL races is built with successively abstract ways;
The merging module, specifically for:
N number of history URL is obtained from the URL races, N is positive integer;
It is handled as follows for each history URL:
The URL to be detected is compared two-by-two with history URL, the difference between the URL to be detected and history URL is obtained It is different;
Difference between the URL to be detected and history URL is carried out successively abstract so as to construct the URL to be detected with being somebody's turn to do Subpattern tree between history URL;
The subpattern tree is incorporated to the URL races.
13. device as claimed in claim 12, it is characterised in that including:
The merging module specifically for:
According to order from shallow to deep, the subpattern tree node is compared pair with the node of URL races scheme-trees successively;Institute It is the URL pattern in the subpattern tree to state subpattern tree node, and the URL races pattern tree node is URL races scheme-tree In URL pattern;
If without comprising or by inclusion relation between the node of the subpattern tree and the node of URL races scheme-tree, directly The subpattern tree node and the URL pattern tree node are merged;
If the node of the subpattern tree includes the node of URL races scheme-tree, by the subpattern tree under the node Sub- level node compared with the node of the URL races, until the subpattern tree is incorporated in URL races scheme-tree;
If the node of URL races scheme-tree includes the node of the subpattern tree, by the sub- level section of URL races scheme-tree Point is compared with the node of the subpattern tree, until the subpattern tree is incorporated in URL races scheme-tree.
14. device as claimed in claim 13, it is characterised in that including:
The extraction module, specifically for:
Since the URL to be detected, by being deep to the shallow scheme-tree for successively traveling through the URL races;
When the number in the scheme-tree in the presence of the URL without abstract processing exceedes the mode node of predetermined threshold value, the mould Formula node is the representative pattern of the URL to be detected.
15. the device as described in any one of claim 9 to 14, it is characterised in that including:
The processing module is additionally operable to:
If the pattern that represents is tested, using the testing result for representing pattern as the URL to be detected detection As a result.
16. the device as described in any one of claim 9 to 14, it is characterised in that including:
The processing module, is additionally operable to:
If without the URL races with the URL to be detected with identical primary pattern in existing each URL races, storage is described URL to be detected primary pattern is used as the URL newly added a race;
The URL to be detected is detected.
CN201710108755.7A 2017-02-27 2017-02-27 URL detection method and detection device Active CN106940711B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710108755.7A CN106940711B (en) 2017-02-27 2017-02-27 URL detection method and detection device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710108755.7A CN106940711B (en) 2017-02-27 2017-02-27 URL detection method and detection device

Publications (2)

Publication Number Publication Date
CN106940711A true CN106940711A (en) 2017-07-11
CN106940711B CN106940711B (en) 2020-02-07

Family

ID=59469693

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710108755.7A Active CN106940711B (en) 2017-02-27 2017-02-27 URL detection method and detection device

Country Status (1)

Country Link
CN (1) CN106940711B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898046A (en) * 2020-07-16 2020-11-06 北京天空卫士网络安全技术有限公司 Redirection management method and device
CN111935133A (en) * 2020-08-06 2020-11-13 北京顶象技术有限公司 White list generation method and device
CN113839940A (en) * 2021-09-18 2021-12-24 北京知道创宇信息技术股份有限公司 URL pattern tree-based defense method and device, electronic equipment and readable storage medium
CN114650152A (en) * 2020-12-17 2022-06-21 中国科学院计算机网络信息中心 Method and system for detecting vulnerability of super computing center

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727447A (en) * 2008-10-10 2010-06-09 浙江搜富网络技术有限公司 Generation method and device of regular expression based on URL
CN102222187A (en) * 2011-06-02 2011-10-19 国家计算机病毒应急处理中心 Domain name structural feature-based hang horse web page detection method
CN102739679A (en) * 2012-06-29 2012-10-17 东南大学 URL(Uniform Resource Locator) classification-based phishing website detection method
CN104699851A (en) * 2015-04-08 2015-06-10 上海理想信息产业(集团)有限公司 Service tag extension method in big data environment
CN105912573A (en) * 2016-03-30 2016-08-31 北京网康科技有限公司 Data updating method and data updating device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727447A (en) * 2008-10-10 2010-06-09 浙江搜富网络技术有限公司 Generation method and device of regular expression based on URL
CN102222187A (en) * 2011-06-02 2011-10-19 国家计算机病毒应急处理中心 Domain name structural feature-based hang horse web page detection method
CN102739679A (en) * 2012-06-29 2012-10-17 东南大学 URL(Uniform Resource Locator) classification-based phishing website detection method
CN104699851A (en) * 2015-04-08 2015-06-10 上海理想信息产业(集团)有限公司 Service tag extension method in big data environment
CN105912573A (en) * 2016-03-30 2016-08-31 北京网康科技有限公司 Data updating method and data updating device

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898046A (en) * 2020-07-16 2020-11-06 北京天空卫士网络安全技术有限公司 Redirection management method and device
CN111898046B (en) * 2020-07-16 2024-02-13 北京天空卫士网络安全技术有限公司 Method and device for redirection management
CN111935133A (en) * 2020-08-06 2020-11-13 北京顶象技术有限公司 White list generation method and device
CN114650152A (en) * 2020-12-17 2022-06-21 中国科学院计算机网络信息中心 Method and system for detecting vulnerability of super computing center
CN114650152B (en) * 2020-12-17 2023-06-20 中国科学院计算机网络信息中心 Super computing center vulnerability detection method and system
CN113839940A (en) * 2021-09-18 2021-12-24 北京知道创宇信息技术股份有限公司 URL pattern tree-based defense method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN106940711B (en) 2020-02-07

Similar Documents

Publication Publication Date Title
CN106940711A (en) A kind of URL detection methods and detection means
CN107122340B (en) A kind of similarity detection method of the science and technology item return based on synonym analysis
CN110032737A (en) A kind of boundary combinations name entity recognition method neural network based
US9734147B2 (en) Clustering repetitive structure of asynchronous web application content
CN103399872B (en) The method and apparatus that webpage capture is optimized
CN109005145A (en) A kind of malice URL detection system and its method extracted based on automated characterization
CN107967208A (en) A kind of Python resource sensitive defect code detection methods based on deep neural network
CN104615542B (en) A kind of method of the fragility association analysis auxiliary bug excavation based on function call
CN107423391A (en) The information extracting method of Web page structural data
CN103154884B (en) Mode detection
CN104243315A (en) Apparatus and Method for Uniquely Enumerating Paths in a Parse Tree
CN106095979A (en) URL merging treatment method and apparatus
CN106789912A (en) Router data plane anomaly detection method based on classification regression tree
CN107341399A (en) Assess the method and device of code file security
CN106708952A (en) Web page clustering method and device
CN107092670A (en) A kind of visual network crawler system and analysis method based on embedded browser
CN112052413B (en) URL fuzzy matching method, device and system
CN107066548A (en) The method that web page interlinkage is extracted in a kind of pair of dimension classification
CN105550169A (en) Method and device for identifying point of interest names based on character length
CN104580254B (en) A kind of fishing website identifying system and method
CN112989348A (en) Attack detection method, model training method, device, server and storage medium
CN103870495B (en) Method and device for extracting information from website
CN103927325B (en) A kind of method and device classified to URL
CN106874340A (en) A kind of web page address sorting technique and device
CN109561163A (en) The generation method and device of uniform resource locator rewriting rule

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 100089 Beijing city Haidian District Road No. 4 North wa Yitai three storey building

Patentee after: NSFOCUS Technologies Group Co.,Ltd.

Patentee after: NSFOCUS TECHNOLOGIES Inc.

Address before: 100089 Beijing city Haidian District Road No. 4 North wa Yitai three storey building

Patentee before: NSFOCUS INFORMATION TECHNOLOGY Co.,Ltd.

Patentee before: NSFOCUS TECHNOLOGIES Inc.

CP01 Change in the name or title of a patent holder