CN106940711A - A kind of URL detection methods and detection means - Google Patents
A kind of URL detection methods and detection means Download PDFInfo
- Publication number
- CN106940711A CN106940711A CN201710108755.7A CN201710108755A CN106940711A CN 106940711 A CN106940711 A CN 106940711A CN 201710108755 A CN201710108755 A CN 201710108755A CN 106940711 A CN106940711 A CN 106940711A
- Authority
- CN
- China
- Prior art keywords
- url
- detected
- tree
- races
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9566—URL specific, e.g. using aliases, detecting broken or misspelled links
Abstract
The embodiment of the invention discloses a kind of URL detection methods and detection means, for solving the problem of prior art can not carry out accurate parameter extraction, including:Obtain URL to be detected routing information;Abstract processing is carried out to URL to be detected routing information, URL to be detected primary pattern is obtained;According to URL to be detected primary pattern, the URL races belonging to URL to be detected are determined;URL races have and the primary pattern of URL identicals to be detected;URL to be detected is incorporated to URL races;URL to be detected representative pattern is selected from URL races;Parameter in the URL to be detected according to representing schema extraction detected, and using URL to be detected testing result as the pattern that represents testing result.Detection URL need to be only treated according to URL to be detected abstract part and carries out parameter extraction, so as to improve the accuracy rate of parameter extraction, and then the accuracy rate of URL detections is improved.
Description
Technical field
The present invention relates to communication technical field, more particularly to a kind of URL detection methods and detection means.
Background technology
When being scanned progress risk assessment to network application, it is necessary to carry out detection assessment to each possible point.Its
In URL (Uniform Resource Locator, URL) be the point that there may be leak, however, magnanimity
URL can cause retrieving redundant and complicated, or even can not complete.Typically, the URL for identical type has identical leakage
Hole, therefore, goes to Beijing South Maxpower Technology Co. Ltd to help us more efficiently to assess the leak of website presence the URL of identical type, weight is done less exactly
The detection of renaturation.In actual applications, meeting extracting parameter and parameter value construction payload data (payload) are detected,
The page, catalogue or parameter name can be carried out according to detected rule simultaneously, or even parameter value carries out duplicate removal.
Prior art can only be adapted to the link of traditional canonical form, be recognized by the spcial character in URL, for example
To http://www.test.com/admin/easycheck/exerecord/Batch_id=28 and http://
www.test.com/admin/easycheck/exerecord/Batch_id=29 is such to be linked, in URL "”
Recognizing inquiry string (query string), the & in query string distinguishes different parameters, with regard to that can reach
The two URL parameter is batch_id in the purpose of parameter extraction and duplicate removal, i.e. example, and parameter value is respectively 28 and 29, right
In the such detection logic of cross-site scripting attack (Cross Site Script, XSS), it is only necessary to detect any in two URL
One.
However, there is many websites to use URL rewritings (URL Rewriting) technology on internet.Such as http://
www.somebloghost.com/Blogs/Posts.phpYear=2006&Month=12&Day=10, rewrites by URL
Afterwards, it can become:http://www.somebloghost.com/Blogs/2006/12/10/.Traditional parameter extraction and go
The mode of weight is just no longer applicable herein.What is more, such as https://www.oschina.net/news/74686/
URL as chandao-8-2-3, it is that variable actual parameter is more difficult which, which is recognized,.To having used URL rewriting techniques
Large-Scale Interconnected web site (than know, Jingdone district) carry out risk assessment when, because website is huge, each URL paths not phase
Together, if duplicate removal can not be carried out effectively to URL, scanning can become substantial amounts of repetition, redundancy and inefficiency, even may not be used
The work that can be completed;If parameter extraction can not be carried out effectively, scanning will become no target, the extremely low nothing of accuracy rate
The work of meaning.
In a word, prior art accurately can not carry out accurate parameter extraction to the URL after rewriting.
The content of the invention
The present invention provides a kind of URL detection methods and detection means, to solve can not be accurate present in prior art
Duplicate removal is carried out to the URL after rewriting, so that the problem of detection efficiency is low.
The embodiment of the present invention provides a kind of uniform resource position mark URL detection method, including:
Obtain URL to be detected routing information;URL to be detected is the URL after rewriting is handled;
Abstract processing is carried out to URL to be detected routing information, URL to be detected primary pattern is obtained;
According to URL to be detected primary pattern, the URL races belonging to URL to be detected are determined;URL races have and URL to be detected
The primary pattern of identical;
URL to be detected is incorporated to URL races;
URL to be detected representative pattern is selected from URL races;
Parameter according to representing in schema extraction URL to be detected detected, and using URL to be detected testing result as
Represent the testing result of pattern.
Optionally, abstract processing is carried out to URL to be detected routing information, obtains URL to be detected primary pattern, bag
Include:
Replaced by canonical, the no special character in URL to be detected is converted into letter or number;No special character includes
Do not possess the character of compartmentation in URL to be detected;
Character string in URL to be detected after canonical replacement processing in each separator is taken out by setting rule
As processing.
Optionally, the character string in the URL to be detected after canonical replacement processing in each separator is advised by setting
Abstract processing is then carried out, including:
If character string in separator is alphabetic character string, by alphabetic character string it is abstract be the first mark;
If character string in separator is digit strings, by digit strings it is abstract be the second mark;
If the character string in separator is the character string being made up of numeral and letter, abstract is the 3rd mark.
Optionally, URL to be detected is incorporated in URL races, including:
URL races are the scheme-tree that each URL in URL races is built with successively abstract ways;
N number of history URL is obtained from URL races, N is positive integer;
It is handled as follows for each history URL:
URL to be detected is compared two-by-two with history URL, the difference between URL to be detected and history URL is obtained;
Difference between URL to be detected and history URL is carried out successively abstract so as to construct URL to be detected and the history
Subpattern tree between URL;
Subpattern tree is incorporated to URL races.
Optionally, subpattern tree is incorporated to URL races, including:According to order from shallow to deep, successively by subpattern tree node
Compared with the node of URL races scheme-trees pair;Subpattern tree node is the URL pattern in subpattern tree, and URL races pattern tree node is
URL pattern in URL races scheme-tree;
If without comprising or by inclusion relation between the node of subpattern tree and the node of URL races scheme-tree, directly by son
Pattern tree node and URL pattern tree node merge;
If the node of subpattern tree includes the node of URL races scheme-tree, by sub- level section of the subpattern tree under the node
Point is compared with the node of URL races, until subpattern tree is incorporated in URL races scheme-tree;
If the node of URL races scheme-tree includes the node of subpattern tree, by the sub- level node and submodule of URL races scheme-tree
The node of formula tree compares, until subpattern tree is incorporated in URL races scheme-tree.
Optionally, URL to be detected representative pattern is selected from URL races, including:
Since URL to be detected, by the scheme-tree for being deep to shallow successively traversal URL races;
When the number in scheme-tree in the presence of the URL without abstract processing exceedes the mode node of predetermined threshold value, pattern section
Point is URL to be detected representative pattern.
Optionally, including:
If the pattern of representative is tested, using the testing result for the pattern that represents as URL to be detected testing result.
Optionally, also include:
If storing to be checked without the URL races with URL to be detected with identical primary pattern in existing each URL races
The primary pattern for surveying URL is used as the URL newly added a race;
URL to be detected is detected.
The embodiment of the present invention provides a kind of uniform resource position mark URL detection means, including:
Acquisition module, the routing information for obtaining URL to be detected;URL to be detected is the URL after rewriting is handled;
Abstract module, carries out abstract processing for the routing information to URL to be detected, obtains URL to be detected primary mould
Formula;
Enquiry module, for the primary pattern according to URL to be detected, determines the URL races belonging to URL to be detected;URL races have
Have and the primary pattern of URL identicals to be detected;
Merging module, for URL to be detected to be incorporated into URL races;
Extraction module, the representative pattern for selecting URL to be detected from URL races;
Processing module, for being detected according to the parameter represented in schema extraction URL to be detected, and by URL to be detected
Testing result as the pattern that represents testing result.
Optionally, abstract module, specifically for:
Replaced by canonical, the no special character in URL to be detected is converted into letter or number;No special character includes
Do not possess the character of compartmentation in URL to be detected;
Character string in URL to be detected after canonical replacement processing in each separator is taken out by setting rule
As processing.
Optionally, abstract module, specifically for:
If character string in separator is alphabetic character string, by alphabetic character string it is abstract be the first mark;
If character string in separator is digit strings, by digit strings it is abstract be the second mark;
If the character string in separator is the character string being made up of numeral and letter, abstract is the 3rd mark.
Optionally, URL races are the scheme-tree that each URL in URL races is built with successively abstract ways;
Merging module, specifically for:
N number of history URL is obtained from URL races, N is positive integer;
It is handled as follows for each history URL:
URL to be detected is compared two-by-two with history URL, the difference between URL to be detected and history URL is obtained;
Difference between URL to be detected and history URL is carried out successively abstract so as to construct URL to be detected and the history
Subpattern tree between URL;
Subpattern tree is incorporated to URL races.
Optionally, merging module specifically for:
According to order from shallow to deep, subpattern tree node is compared pair with the node of URL races scheme-trees successively;Subpattern
Tree node is the URL pattern in subpattern tree, and URL races pattern tree node is the URL pattern in URL races scheme-tree;
If without comprising or by inclusion relation between the node of subpattern tree and the node of URL races scheme-tree, directly by son
Pattern tree node and URL pattern tree node merge;
If the node of subpattern tree includes the node of URL races scheme-tree, by sub- level section of the subpattern tree under the node
Point is compared with the node of URL races, until subpattern tree is incorporated in URL races scheme-tree;
If the node of URL races scheme-tree includes the node of subpattern tree, by the sub- level node and submodule of URL races scheme-tree
The node of formula tree compares, until subpattern tree is incorporated in URL races scheme-tree.
Optionally, extraction module, specifically for:
Since URL to be detected, by the scheme-tree for being deep to shallow successively traversal URL races;
When the number in scheme-tree in the presence of the URL without abstract processing exceedes the mode node of predetermined threshold value, pattern section
Point is URL to be detected representative pattern.
Optionally, processing module specifically for:
If the pattern of representative is tested, using the testing result for the pattern that represents as URL to be detected testing result.
Optionally, processing module, is additionally operable to:
If storing to be checked without the URL races with URL to be detected with identical primary pattern in existing each URL races
The primary pattern for surveying URL is used as the URL newly added a race;
URL to be detected is detected.
In summary, the embodiments of the invention provide a kind of URL detection methods and detection means, including:Obtain to be detected
URL routing information;URL to be detected is the URL after rewriting is handled;Abstract place is carried out to URL to be detected routing information
Reason, obtains URL to be detected primary pattern;According to URL to be detected primary pattern, the URL races belonging to URL to be detected are determined;
URL races have and the primary pattern of URL identicals to be detected;URL to be detected is incorporated to URL races;Selected from URL races to be detected
URL representative pattern;Parameter in the URL to be detected according to representing schema extraction detected, and by URL to be detected inspection
Survey testing result of the result as the pattern that represents.URL to be detected routing information is subjected to abstract processing and obtains URL's to be detected
Primary pattern, primary pattern here is one of URL to be detected more rough feature, is incorporated into and treated according to primary pattern
Detect in the URL races described in URL, URL in URL races and URL to be detected has identical more rough feature, in URL
In race, specific subdivision is carried out to above-mentioned more rough feature under primary pattern, multiple URL pattern branches have been obtained, wherein
Some URL pattern, that is, representing the architectural feature of pattern just can represent this URL to be detected architectural feature, represent in pattern
URL to be detected characteristic has been carried out it is abstract, therefore, only need to be right according to URL to be detected abstract part during extracting parameter
URL to be detected carries out parameter extraction, so as to improve the accuracy rate of parameter extraction, and then improves the accuracy rate of URL detections.
Brief description of the drawings
Technical scheme in order to illustrate more clearly the embodiments of the present invention, below will be to that will make needed for embodiment description
Accompanying drawing is briefly introduced, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for this
For the those of ordinary skill in field, without having to pay creative labor, it can also be obtained according to these accompanying drawings
His accompanying drawing.
Fig. 1 is a kind of URL detection methods schematic flow sheet provided in an embodiment of the present invention;
Fig. 2 merges one of schematic diagram for a kind of scheme-tree provided in an embodiment of the present invention;
Fig. 3 merges the two of schematic diagram for a kind of scheme-tree provided in an embodiment of the present invention;
A kind of Fig. 4 structure of the detecting device schematic diagrames provided in an embodiment of the present invention.
Embodiment
In order that the object, technical solutions and advantages of the present invention are clearer, below in conjunction with accompanying drawing the present invention is made into
One step it is described in detail, it is clear that described embodiment is only embodiment of the invention a part of, rather than whole implementation
Example.Based on the embodiment in the present invention, what those of ordinary skill in the art were obtained under the premise of creative work is not made
All other embodiment, belongs to the scope of protection of the invention.
Fig. 1 is a kind of URL detection methods schematic flow sheet provided in an embodiment of the present invention, as shown in figure 1, including following step
Suddenly:
S101:Obtain URL to be detected routing information;URL to be detected is the URL after rewriting is handled;
S102:Abstract processing is carried out to URL to be detected routing information, URL to be detected primary pattern is obtained;
S103:According to URL to be detected primary pattern, the URL races belonging to URL to be detected are determined;URL races have with it is to be checked
Survey the primary pattern of URL identicals;
S104:URL to be detected is incorporated to URL races;
S105:URL to be detected representative pattern is selected from URL races;
S106:Parameter in the URL to be detected according to representing schema extraction detected, and by URL to be detected inspection
Survey testing result of the result as the pattern that represents.
In specific implementation process, the URL to be detected in the embodiment of the present invention is the URL after URL rewritings processing.This hair
URL to be detected in bright embodiment is any one in the magnanimity URL that need to be detected, after this URL to be detected detection is completed, also
Another URL not detected can be extracted from magnanimity URL to repeat the above steps as URL to be detected.
URL after rewriting is handled can be analyzed to protocol name (schema), address field (host), path port
(port), path (path).It is pointed out that the premise of the embodiment of the present invention is each URL have identical schema,
Host, port information, the URL will with identical schema, host, port information gather as a processing, for place
URL in processing set is detected using the detection method disclosed in the embodiment of the present invention.Therefore, carried out to URL
Before detection, need first to carry out diversity, typically, the URL meetings of same source to URL according to URL schema, host, port information
With identical schema, host, port information, accordingly it is also possible to diversity is directly carried out to URL according to URL source, it
Afterwards, detected for each URL diversity using the method disclosed in the embodiment of the present invention.
In S101 specific implementation process, URL to be detected routing information contains URL to be detected parameter and parameter
Value information, therefore, routing information are the main objects of embodiment of the present invention processing.Extraction to routing information can be according to existing
Path identification method extracted.
In S102 specific implementation process, abstract processing is carried out to URL to be detected routing information, i.e., to be detected
Particular content progress in URL routing information is abstract, preferably, the letter and number and combinations thereof in path are adopted
Distinguished with different marks, that is, taken out the composition architectural feature of routing information.
In S103 specific implementation process, the URL with identical primary pattern being tested constitutes these
URL races belonging to URL.Because the URL being tested has multiple, they can have different primary patterns, therefore also correspond to deposit
In multiple URL races.After the primary pattern for obtaining URL to be detected, travel through each URL race, if a certain URL races have with it is to be detected
The primary pattern of URL identicals, then the URL races be changed into the URL races belonging to URL to be detected.If preferably, in existing each URL races
Without the URL races with URL to be detected with identical primary pattern, then the primary pattern of the URL to be detected is stored as one
The individual URL races newly added;URL to be detected is detected.
In S104 specific implementation process, URL race of the URL races belonging to URL to be detected herein, the two has identical
Primary pattern, so URL to be detected abstract can be necessarily incorporated in affiliated URL races by a certain degree of.Preferably, from
History URL is extracted in URL races and is compared with URL to be detected, it is abstract layer by layer according to local difference therebetween, until finding
Identical pattern, afterwards, the multiple middle models generated by URL to be detected and layer by layer in abstraction process are together incorporated to URL races
In.
In S105 specific implementation process, multiple URL and URL moulds are included in the URL races after being handled by S104
Formula, URL to be detected representative pattern, which need to be incorporated to from S104 in the URL pattern of URL races, to be chosen.Alternatively, for representing pattern
Selection, can determine that the depth of URL pattern herein refers to according to the number of leaf node under the depth and URL pattern of URL pattern
Be successively to be had as to the number of plies of the URL pattern during URL to be detected, is generated, tool is higher as the number of plies by primary pattern,
URL pattern depth is deeper;The number of leaf node refers to the number of specific URL under the URL pattern under URL pattern.
In S106 specific implementation process, this URL to be detected structure spy can be represented by representing the architectural feature of pattern
Levy, is represented in pattern URL to be detected characteristic has been carried out it is abstract, therefore, only need to be according to be detected during extracting parameter
URL abstract part treats detection URL and carries out parameter extraction, so as to improve the accuracy rate of parameter extraction, and then improves URL
The accuracy rate of detection, if for example, URL to be detected path is/10882/, it is/1088d/, the then parameter extracted that it, which represents pattern,
For 2, if it is/108dd/ that it, which represents pattern, the parameter extracted is 82.
If being tested preferably, representing pattern, using the testing result for the pattern that represents as URL to be detected detection knot
Really.The testing result for representing pattern represents the testing result of all leaf node URL under the URL pattern, represents the inspection of pattern
Surveying result is determined by the testing result of some URL in these leaf nodes URL.Optionally, do not have if this represents pattern
There is testing result, illustrate that this represents the leaf node URL under pattern and does not have what is be tested also, now, URL to be detected is carried out
Detection, and using URL to be detected testing result as the URL pattern testing result, so as to realize the duplicate removal to URL,
Improve URL detection efficiencies.
URL to be detected routing information is subjected to the primary pattern that abstract processing obtains URL to be detected, primary mould here
During formula is one of URL to be detected more rough feature, the URL races according to primary pattern is incorporated into URL to be detected,
URL in URL races and URL to be detected has identical more rough feature, in URL races, to above-mentioned under primary pattern
More rough feature has carried out specific subdivision, obtains multiple URL pattern branches, wherein some URL pattern, that is, represents mould
The testing result of formula just can represent this URL to be detected testing result, detected, and be just not required in advance if representing pattern
This URL to be detected is detected again, so as to simplify URL detection process, accelerates detection efficiency.In addition, the above method can be real-time
Carry out, be not required to first set up detection model, therefore detection process is easier.
The embodiment of the present invention provides a kind of implementation method of feasible acquisition URL to be detected primary pattern, including:Pass through
Canonical is replaced, and the no special character in URL to be detected is converted into letter or number;No special character is included in URL to be detected
Do not possess the character of compartmentation;Character string in URL to be detected after canonical replacement processing in each separator is pressed
Setting rule carries out abstract processing.The existing letter of constituent of URL routing informations, numeral, there is character again, in these characters,
Partial character has compartmentation, i.e. spcial character, and remaining character is the information composition of URL routing informations.Preferably.Will be to be checked
The no special character surveyed in URL is converted to letter or number, only retains separating character, URL architectural feature can be made more bright
It is aobvious, so as to reduce the identification difficulty to URL routing informations, and then it is easier abstract to URL progress.
Table one
Canonical | Replace |
' [%+!\[\];&]' | '0v0' |
'(<=w) (![a-zA-Z]+$)' | 'v' |
'(<=[a-zA-Z] -) (w+-)+(=d+ $) ' | 'v' |
'(<=[a-zA-Z]) (d* [- _] d*)+(=[a-zA-Z]) ' | '0' |
Table one is a kind of canonical Substitution Rules provided in an embodiment of the present invention, as shown in Table 1, part in above-mentioned character string
Character is substituted by 0 or letter v of numeral, and partial character is by numeral 0 and letter v replacements, after this is due to this part no special character
Extended meeting is conceptualized as the 3rd mark, same with numeral and the mashed up character string of letter, so it is for convenience to do replacement with 0 and v
Follow-up is uniformly processed.It is pointed out that 0 and v is that a kind of numeral and letter are illustrated herein, in actual use,
Can be 1 and u, any letter and number such as 2 and b.
After canonical replacement processing, letter, numeral and separator, root are only included in URL routing informations to be detected
URL routing informations to be detected are divided into multiple character strings according to separator, abstract processing is carried out to each character string.
More specifically, the embodiment of the present invention is provided each divides in a kind of URL to be detected by after canonical replacement processing
The method for carrying out abstract processing by setting rule every the character string in symbol, including:
If character string in separator is alphabetic character string, by alphabetic character string it is abstract be the first mark;If separator
Interior character string be digit strings, then by digit strings it is abstract be the second mark;If the character string in separator is by counting
The character string of word and letter composition, then abstract is the 3rd mark.Alphabetic character string refers to the character string being made up of alphabetic character,
Similarly, digit strings are the character string being made up of numerical character, and the character string that numeral and letter are constituted refers to constituting character
Existing numerical character has alphabetic character again in multiple characters of string.It is abstract using different identification progress to the character of different compositions,
For example, using " c " represent alphabetic character, using " d " represent numerical character, using " w " represent numerical character or alphabetic word
Symbol, further, the same class character of continuous multiple appearance is represented using "+", by "+" respectively with " c " " d " " w " combine
Respectively constituted first mark, second mark and the 3rd mark, for example, character string " abcderf " just can it is abstract for " c+ ".
Each character string in URL routing informations to be detected is carried out abstract, just can finally obtain URL to be detected primary mould
Formula.Table two is a kind of primary mode example provided in an embodiment of the present invention, as shown in Table 2, four different URL's to be detected
Routing information is conceptualized as two kinds of primary patterns ,/reach296/p/3816387.html and/reach296/p/ respectively
4001918.html is conceptualized as/w+/c+/d+. c+ ,/reach296/wlwmanifest.xml and/reach296/
Default.html is conceptualized as/w+/c+. c+.Generally, the URL with identical primary pattern can have identical
Logic is handled, therefore can be considered in same family, such as table two ,/reach296/p/3816387.html and/reach296/p/
4001918.html just can be considered as same family ,/reach296/wlwmanifest.xml and/reach296/
Default.html can be considered as another race.
Table two
Obtain after URL to be detected primary pattern, it is necessary to be incorporated to URL to be detected according to URL to be detected primary pattern
In URL races belonging to it, the scheme-tree that URL races here are built for each URL in URL races with successively abstract ways.The present invention
Embodiment provides a kind of method being incorporated to URL to be detected in URL races, including:N number of history URL is obtained from URL races, N is just
Integer;It is handled as follows for each history URL:URL to be detected is compared two-by-two with history URL, obtained to be detected
Difference between URL and history URL;Difference between URL to be detected and history URL is carried out successively abstract so as to construct
Subpattern tree between URL to be detected and history URL;Subpattern tree is incorporated to the URL races.Preferably, being carried from URL races
The N number of history URL taken is belonging respectively to different most deep URL patterns, and most deep URL pattern here is in URL tree in each branch
Depth most deep URL pattern.Alternatively it is also possible to extract newest top n history URL processing from URL races.In theory
Situation the most perfect is to take history URL and URL to be detected described in URL races to be contrasted two-by-two, but for large-scale station
It is excessively complicated for point, whether extracted, extracted according further to URL treatment times, the N number of history extracted according to URL pattern
URL need to be representative as far as possible, so as to lift processing speed while ensureing detection accuracy.
For the URL processing of each history, in specific implementation process, need to URL to be detected and history URL two-by-two it
Between carry out charactor comparison, to difference carry out multilayer it is abstract.Table three is a kind of comparative result mark side provided in an embodiment of the present invention
Method example, with the different comparative result of different character representations, as shown in Table 3, fragment 1 and fragment 2 represent to be detected respectively
Character string fragment in URL and history URL routing information, if character only exists in fragment 1, this character is identified with '-',
If character only exists in fragment 2, this character is identified with '+', if character all exists in fragment 1 and fragment 2, is marked with ' '
Know.
Table three
Character | Implication |
′-′ | Only exist in fragment 1 |
'+' | Only exist in fragment 2 |
′′ | All exist in fragment 1 and 2 |
After the preliminary comparison's result for obtaining URL to be detected and history URL, difference therebetween is carried out to take out for the first time
As, this time abstract purpose be the letter of difference section with " c " represent, numeral with " d " represent.Optionally, using canonical
Replace and complete above-mentioned processing, table four is a kind of canonical Substitution Rules provided in an embodiment of the present invention.
Table four
Canonical | Replace | |
1 | ' [a-zA-Z+] | % [0-9A-Z] { 2 } ' | '\c' |
1 | '\d' | '\d' |
2 | '(<!\\)[a-zA-Z]' | '\c' |
2 | '\d' | '\d' |
3 | '(\\d)+' | '\d+' |
3 | '(\\c)+' | '\c+' |
4 | '[^/]+' | '\w+' |
After abstract for the first time, also need to proceed it is abstract, until URL to be detected and history URL be conceptualized as it is same
URL pattern, now, URL to be detected and history URL and the repeatedly URL pattern of abstract generation together constitute URL to be detected and
History URL subpattern tree.Because history URL and URL to be detected comes from same URL races, so the two final sure is obtained
Obtain identical URL pattern, that is to say, that sure generation has the subpattern tree of identical root node.With URL paths/cate/
Exemplified by 108705/ and/cate/108709/, difference between them can using preliminary identification as:/,c,a,t,e,/,1,0,8,7,
0, -5 ,+9 ,/, it is seen that only 5 is different with 9 two numerals, and above-mentioned two URL is carried out into abstract, acquisition/cate/ for the first time
10870 d/, above-mentioned two URL for the first time it is abstract just obtains identical URL pattern, now, subpattern tree is:
Again for example, exemplified by URL paths/pick/1/ and/cate/12/, the subpattern tree obtained after the two comparison is abstract is:
For each history URL, after URL to be detected and the history URL subpattern tree is obtained, by subpattern tree simultaneously
In the scheme-tree for entering URL races.Each node in sub- scheme-tree and URL races scheme-tree is compared preferably, being replaced using canonical,
Table five replaces comparison rule for a kind of canonical provided in an embodiment of the present invention, as shown in Table 5, and A nodes are one in subpattern tree
Individual URL pattern, B node is a URL pattern in URL races scheme-tree, when the expression formula of A nodes has completely included B node
During expression formula, its comparing result is 1, and such as expression formula of A bytes is '/cate/ d+/', and the expression formula of B byte is '/cate/
1087 d d/', " d+ " represent with long number character string, " 1087 d d " represent before 4-digit number be fixed as 1087,
Two character strings for numeral afterwards, it is clear that '/cate/ d+/' contain '/cate/1087 d d/', therefore, comparing result is
1, accordingly, when the expression formula of B node has completely included the expression formula of A nodes, comparing result is -1, when A node expressions
B node expression formula can neither be included, when B node expression formula can not include A node expressions, its comparing result is 0.
Table five
More specifically, when comparing A node expressions and B node expression formula, A node expressions can be translated as just
Then expression formula, is translated as B node expression formula to translate the regular expression obtained with A node expressions after general character string and
Row matching.Wherein, when being translated to B node expression formula, possible parameter need to be considered, in order to avoid fortuitous phenomena, preferably
, an expression formula is translated as at least two character strings, when being all matched just calculate the match is successful, for example, B node express
Formula/cat/ d/ can just be translated into/cat/0/ and/cat/9/, both translate the canonical table of coming by A node expressions
Matched up to formula, just calculate A nodes and contain B node.Table six is a kind of URL pattern expression formula translation provided in an embodiment of the present invention
Sample Rules.
Table six
More specifically, the embodiment of the present invention provides a kind of scheme-tree merging mode, including:
According to order from shallow to deep, subpattern tree node is compared pair with the node of URL races scheme-trees successively;Subpattern
Tree node is the URL pattern in subpattern tree, and URL races pattern tree node is the URL pattern in URL races scheme-tree;If subpattern
Without comprising or by inclusion relation between the node of tree and the node of URL races scheme-tree, then directly by subpattern tree node and URL moulds
Formula tree node merges;If the node of subpattern tree includes the node of URL races scheme-tree, by son of the subpattern tree under the node
Level node is compared with the node of URL races, until subpattern tree is incorporated in URL races scheme-tree;If the node of URL races scheme-tree
The node of subpattern tree is included, then the sub- level node of URL races scheme-tree is compared with the node of subpattern tree, until by submodule
Formula tree is incorporated in URL races scheme-tree.
Merging for subpattern tree and URL races scheme-tree is a recursive operation in fact, there is following three kinds of possibility
Situation:
Situation 1:Two scheme-trees compare since root node, root node comparative result be 0 when, directly by subpattern tree with
URL races scheme-tree merges.Fig. 2 merges one of schematic diagram for a kind of scheme-tree provided in an embodiment of the present invention, as shown in Fig. 2 two
The root node of individual scheme-tree is directly merged.
Situation 2:When root node comparative result is -1, the sub- level scheme-tree in URL races scheme-tree is waited to compare as new
Compared with scheme-tree, be compared, repeat the above steps again with subpattern tree, until subpattern tree is incorporated into URL races scheme-tree
In, Fig. 3 merges the two of schematic diagram for a kind of scheme-tree provided in an embodiment of the present invention, as shown in figure 3, the root node of subpattern tree
Finally become a URL pattern node in URL races scheme-tree, subpattern tree as URL races pattern a sub- level pattern
Tree is incorporated with URL races scheme-tree.
Situation 3:When root node comparative result is 1, the relativity of subpattern tree and URL races scheme-tree is exchanged into repetition
Sub- level scheme-tree in step in situation 2, i.e. bundle scheme-tree is as new scheme-tree to be compared, with URL races scheme-tree
It is compared, repeats the above steps again, until subpattern tree mutually merges with URL races scheme-trees.
The embodiment of the present invention provides an instantiation for merging subpattern tree with URL races scheme-trees, wherein, subpattern
Set and be:
After the two merges, the new URL races scheme-tree of acquisition is:
URL to be detected and N number of history URL N number of subpattern tree for being compared acquisition two-by-two are fully incorporated URL races scheme-tree
Afterwards, URL to be detected representative pattern is extracted from URL races pattern.Specifically, the embodiment of the present invention provides a kind of URL to be detected
Representative pattern determination method, including:Since URL to be detected, by the scheme-tree for being deep to shallow successively traversal URL races;Work as mould
When number in formula tree in the presence of the URL without abstract processing exceedes the mode node of predetermined threshold value, this mode node is to be detected
URL representative pattern.The new URL races mould that the subpattern tree provided with above-described embodiment obtains after merging with URL races scheme-trees
The embodiment of the present invention is illustrated exemplified by formula tree, it is noted that above-mentioned new URL races scheme-tree is not practical application mistake
The URL races scheme-tree for the pattern that represents is extracted in journey, URL races scheme-tree in actual use is to incorporate N number of subpattern tree
URL races scheme-tree afterwards.Assuming that URL to be detected is /cate/108709/, it is in above-mentioned new URL races scheme-tree, correspondence
Each URL pattern arranged from deep to shallow by level, and the number of the URL without abstract processing under each URL pattern is:
It can be seen that, a URL to be detected has the URL pattern of different levels, preferably, representing pattern to possess model identical
The number of URL without abstract processing reaches the depth of predetermined threshold value most deep pattern.For example, when predetermined threshold value is 2, representing
Pattern for/cate/10870 d/, and for example, when predetermined threshold value is 3, represent pattern as/c+/d+/.Optionally, predetermined threshold value
Can by rule of thumb or actual conditions setting, can also be obtained according to logical calculated.Preferably, the embodiment of the present invention provides a kind of default
Threshold value T computational methods, as shown in formula one:
T=max (3, URL numbers/max (race's scheme-tree depth, 1)) (formula one),
As shown in formula one, T minimums cannot be below 3.
When determine represent pattern after, need to determine whether to represent whether pattern has been tested.If representing under pattern
History URL in have what is be tested, then it is assumed that this represents pattern and has been tested.When contemporary table schema has been tested, just
Using the testing result for the pattern that represents as URL to be detected testing result.If the pattern of representative was not detected among, to be detected
URL is detected.Table seven is a kind of URL detection examples to be detected provided in an embodiment of the present invention, as shown in Table 7, for be checked
URL/cate/108709/ is surveyed, when it represents pattern for/cate/10870d, the parameter of extraction is 9, and is built according to parameter 9
Payload carries out URL detections;When its represent pattern as/cate/ d d d d d d when, the parameter of extraction is 108709, and root
Payload, which is built, according to parameter 108709 carries out URL detections;Contemporary table schema for/c c c c/ d d d d d d when, extract
Parameter be cate and 108709, and according to parameter cate and 108709 build payload carry out URL detections.Further,
URL to be detected is completed after detection, using URL to be detected testing result as the pattern that represents testing result.
Table seven
In summary, the embodiments of the invention provide a kind of URL detection methods, including:Obtain URL to be detected path letter
Breath;URL to be detected is the URL after rewriting is handled;Abstract processing is carried out to URL to be detected routing information, obtained to be checked
Survey URL primary pattern;According to URL to be detected primary pattern, the URL races belonging to URL to be detected are determined;URL races have with
The primary pattern of URL identicals to be detected;URL to be detected is incorporated to URL races;URL to be detected representative mould is selected from URL races
Formula;Parameter in the URL to be detected according to representing schema extraction detected, and using URL to be detected testing result as
Represent the testing result of pattern.URL to be detected routing information is subjected to the primary pattern that abstract processing obtains URL to be detected,
Here primary pattern is one of URL to be detected more rough feature, and URL institutes to be detected are incorporated into according to primary pattern
In the URL races stated, URL in URL races and URL to be detected has identical more rough feature, primary in URL races
Specific subdivision has been carried out to above-mentioned more rough feature under pattern, multiple URL pattern branches, wherein some URL have been obtained
Pattern, that is, representing the architectural feature of pattern just can represent this URL to be detected architectural feature, represent in pattern to be detected
URL characteristic carried out it is abstract, therefore, during extracting parameter, only detection need to be treated according to URL to be detected abstract part
URL carries out parameter extraction, so as to improve the accuracy rate of parameter extraction, and then improves the accuracy rate of URL detections.
Based on identical technical concept, the embodiment of the present invention also provides a kind of detection means, on the detection means is executable
State embodiment of the method.Fig. 4 is a kind of structure of the detecting device schematic diagram provided in an embodiment of the present invention, as shown in figure 4, detection means
400 include:
Acquisition module 401, the routing information for obtaining URL to be detected;URL to be detected is after rewriting is handled
URL;
Abstract module 402, carries out abstract processing for the routing information to URL to be detected, obtains the primary of URL to be detected
Pattern;
Enquiry module 403, for the primary pattern according to URL to be detected, determines the URL races belonging to URL to be detected;URL
Race has and the primary pattern of URL identicals to be detected;
Merging module 404, for URL to be detected to be incorporated into URL races;
Extraction module 405, the representative pattern for selecting URL to be detected from URL races;
Processing module 406, for being detected according to the parameter represented in schema extraction URL to be detected, and will be to be detected
URL testing result as the pattern that represents testing result.
Optionally, abstract module 402, specifically for:
Replaced by canonical, the no special character in URL to be detected is converted into letter or number;No special character includes
Do not possess the character of compartmentation in URL to be detected;
Character string in URL to be detected after canonical replacement processing in each separator is taken out by setting rule
As processing.
Optionally, abstract module 402, specifically for:
If character string in separator is alphabetic character string, by alphabetic character string it is abstract be the first mark;
If character string in separator is digit strings, by digit strings it is abstract be the second mark;
If the character string in separator is the character string being made up of numeral and letter, abstract is the 3rd mark.
Optionally, URL races are the scheme-tree that each URL in URL races is built with successively abstract ways;
Merging module 404, specifically for:
N number of history URL is obtained from URL races, N is positive integer;
It is handled as follows for each history URL:
URL to be detected is compared two-by-two with history URL, the difference between URL to be detected and history URL is obtained;
Difference between URL to be detected and history URL is carried out successively abstract so as to construct URL to be detected and the history
Subpattern tree between URL;
Subpattern tree is incorporated to URL races.
Optionally, extraction module 405, specifically for:
Since URL to be detected, by the scheme-tree for being deep to shallow successively traversal URL races;
When the number in scheme-tree in the presence of the URL without abstract processing exceedes the mode node of predetermined threshold value, pattern section
Point is URL to be detected representative pattern.
Optionally, processing module 406 is additionally operable to:
If the pattern of representative is tested, using the testing result for the pattern that represents as URL to be detected testing result.
Optionally, processing module 406, are additionally operable to:
If storing to be checked without the URL races with URL to be detected with identical primary pattern in existing each URL races
The primary pattern for surveying URL is used as the URL newly added a race;
URL to be detected is detected.
In summary, the embodiments of the invention provide a kind of uniform resource position mark URL detection method and detection means, bag
Include:Obtain URL to be detected routing information;URL to be detected is the URL after rewriting is handled;To URL to be detected path letter
Breath carries out abstract processing, obtains URL to be detected primary pattern;According to URL to be detected primary pattern, URL to be detected is determined
Affiliated URL races;URL races have and the primary pattern of URL identicals to be detected;URL to be detected is incorporated to URL races;From URL races
Select URL to be detected representative pattern;Parameter in the URL to be detected according to representing schema extraction is detected, and will be treated
Detect that URL testing result is used as the testing result for the pattern that represents.URL to be detected routing information is carried out into abstract processing to obtain
URL to be detected primary pattern, primary pattern here is one of URL to be detected more rough feature, according to primary mould
Formula is incorporated into the URL races described in URL to be detected, and the URL in URL races and URL to be detected has identical more rough
Feature, in URL race, specific subdivision has been carried out to above-mentioned more rough feature under primary pattern, multiple URL moulds have been obtained
Formula branch, wherein some URL pattern, that is, representing the architectural feature of pattern just can represent this URL to be detected architectural feature,
Is represented in pattern URL to be detected characteristic has been carried out it is abstract, therefore, only need to be according to URL's to be detected during extracting parameter
Abstract part treats detection URL and carries out parameter extraction, so as to improve the accuracy rate of parameter extraction, and then improves URL detections
Accuracy rate.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product
Figure and/or block diagram are described.It should be understood that every one stream in flow chart and/or block diagram can be realized by computer program instructions
Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided
The processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce
A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real
The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which is produced, to be included referring to
Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or
The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter
Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, thus in computer or
The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in individual square frame or multiple square frames.
, but those skilled in the art once know basic creation although preferred embodiments of the present invention have been described
Property concept, then can make other change and modification to these embodiments.So, appended claims are intended to be construed to include excellent
Select embodiment and fall into having altered and changing for the scope of the invention.
Obviously, those skilled in the art can carry out the essence of various changes and modification without departing from the present invention to the present invention
God and scope.So, if these modifications and modification of the present invention belong to the scope of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to comprising including these changes and modification.
Claims (16)
1. a kind of uniform resource position mark URL detection method, it is characterised in that including:
Obtain URL to be detected routing information;The URL to be detected is the URL after rewriting is handled;
Abstract processing is carried out to the routing information of the URL to be detected, the primary pattern of the URL to be detected is obtained;
According to the primary pattern of the URL to be detected, the URL races belonging to the URL to be detected are determined;The URL races have with
The primary pattern of URL identicals to be detected;
The URL to be detected is incorporated to the URL races;
The representative pattern of the URL to be detected is selected from the URL races;
According to it is described represent schema extraction described in parameter in URL to be detected detected, and by the detection of the URL to be detected
As a result as the testing result for representing pattern.
2. the method as described in claim 1, it is characterised in that abstract processing is carried out to the routing information of the URL to be detected,
The primary pattern of the URL to be detected is obtained, including:
Replaced by canonical, the no special character in the URL to be detected is converted into letter or number;The no special character
Including not possessing the character of compartmentation in the URL to be detected;
Character string in the URL to be detected after canonical replacement processing in each separator is taken out by setting rule
As processing.
3. method as claimed in claim 2, it is characterised in that by the URL to be detected after canonical replacement processing
Character string in each separator carries out abstract processing by setting rule, including:
If character string in separator is alphabetic character string, by the alphabetic character string it is abstract be the first mark;
If character string in separator is digit strings, by the digit strings it is abstract be the second mark;
If the character string in separator is the character string being made up of numeral and letter, abstract is the 3rd mark.
4. the method as described in claim 1, it is characterised in that the URL to be detected is incorporated in the URL races, including:
The URL races are the scheme-tree that each URL in the URL races is built with successively abstract ways;
N number of history URL is obtained from the URL races, N is positive integer;
It is handled as follows for each history URL:
The URL to be detected is compared two-by-two with history URL, the difference between the URL to be detected and history URL is obtained
It is different;
Difference between the URL to be detected and history URL is carried out successively abstract so as to construct the URL to be detected with being somebody's turn to do
Subpattern tree between history URL;
The subpattern tree is incorporated to the URL races.
5. method as claimed in claim 4, it is characterised in that the subpattern tree is incorporated to the URL races, including:
According to order from shallow to deep, the subpattern tree node is compared pair with the node of URL races scheme-trees successively;Institute
It is the URL pattern in the subpattern tree to state subpattern tree node, and the URL races pattern tree node is URL races scheme-tree
In URL pattern;
If without comprising or by inclusion relation between the node of the subpattern tree and the node of URL races scheme-tree, directly
The subpattern tree node and the URL pattern tree node are merged;
If the node of the subpattern tree includes the node of URL races scheme-tree, by the subpattern tree under the node
Sub- level node compared with the node of the URL races, until the subpattern tree is incorporated in URL races scheme-tree;
If the node of URL races scheme-tree includes the node of the subpattern tree, by the sub- level section of URL races scheme-tree
Point is compared with the node of the subpattern tree, until the subpattern tree is incorporated in URL races scheme-tree.
6. method as claimed in claim 4, it is characterised in that the representative mould of the URL to be detected is selected from the URL races
Formula, including:
Since the URL to be detected, by being deep to the shallow scheme-tree for successively traveling through the URL races;
When the number in the scheme-tree in the presence of the URL without abstract processing exceedes the mode node of predetermined threshold value, the mould
Formula node is the representative pattern of the URL to be detected.
7. the method as described in any one of claim 1 to 6, it is characterised in that also include:
If the pattern that represents is tested, using the testing result for representing pattern as the URL to be detected detection
As a result.
8. the method as described in any one of claim 1 to 6, it is characterised in that also include:
If without the URL races with the URL to be detected with identical primary pattern in existing each URL races, storage is described
URL to be detected primary pattern is used as the URL newly added a race;
The URL to be detected is detected.
9. a kind of uniform resource position mark URL detection means, it is characterised in that including:
Acquisition module, the routing information for obtaining URL to be detected;The URL to be detected is the URL after rewriting is handled;
Abstract module, carries out abstract processing for the routing information to the URL to be detected, obtains the original of the URL to be detected
Raw pattern;
Enquiry module, for the primary pattern according to the URL to be detected, determines the URL races belonging to the URL to be detected;Institute
Stating URL races has and the primary pattern of URL identicals to be detected;
Merging module, for the URL to be detected to be incorporated into the URL races;
Extraction module, the representative pattern for selecting the URL to be detected from the URL races;
Processing module, for according to it is described represent schema extraction described in parameter in URL to be detected detected, and treated described
Detection URL testing result is used as the testing result for representing pattern.
10. device as claimed in claim 9, it is characterised in that including:
The abstract module, specifically for:
Replaced by canonical, the no special character in the URL to be detected is converted into letter or number;The no special character
Including not possessing the character of compartmentation in the URL to be detected;
Character string in the URL to be detected after canonical replacement processing in each separator is taken out by setting rule
As processing.
11. device as claimed in claim 10, it is characterised in that including:
The abstract module, specifically for:
If character string in separator is alphabetic character string, by the alphabetic character string it is abstract be the first mark;
If character string in separator is digit strings, by the digit strings it is abstract be the second mark;
If the character string in separator is the character string being made up of numeral and letter, abstract is the 3rd mark.
12. device as claimed in claim 9, it is characterised in that including:
The URL races are the scheme-tree that each URL in the URL races is built with successively abstract ways;
The merging module, specifically for:
N number of history URL is obtained from the URL races, N is positive integer;
It is handled as follows for each history URL:
The URL to be detected is compared two-by-two with history URL, the difference between the URL to be detected and history URL is obtained
It is different;
Difference between the URL to be detected and history URL is carried out successively abstract so as to construct the URL to be detected with being somebody's turn to do
Subpattern tree between history URL;
The subpattern tree is incorporated to the URL races.
13. device as claimed in claim 12, it is characterised in that including:
The merging module specifically for:
According to order from shallow to deep, the subpattern tree node is compared pair with the node of URL races scheme-trees successively;Institute
It is the URL pattern in the subpattern tree to state subpattern tree node, and the URL races pattern tree node is URL races scheme-tree
In URL pattern;
If without comprising or by inclusion relation between the node of the subpattern tree and the node of URL races scheme-tree, directly
The subpattern tree node and the URL pattern tree node are merged;
If the node of the subpattern tree includes the node of URL races scheme-tree, by the subpattern tree under the node
Sub- level node compared with the node of the URL races, until the subpattern tree is incorporated in URL races scheme-tree;
If the node of URL races scheme-tree includes the node of the subpattern tree, by the sub- level section of URL races scheme-tree
Point is compared with the node of the subpattern tree, until the subpattern tree is incorporated in URL races scheme-tree.
14. device as claimed in claim 13, it is characterised in that including:
The extraction module, specifically for:
Since the URL to be detected, by being deep to the shallow scheme-tree for successively traveling through the URL races;
When the number in the scheme-tree in the presence of the URL without abstract processing exceedes the mode node of predetermined threshold value, the mould
Formula node is the representative pattern of the URL to be detected.
15. the device as described in any one of claim 9 to 14, it is characterised in that including:
The processing module is additionally operable to:
If the pattern that represents is tested, using the testing result for representing pattern as the URL to be detected detection
As a result.
16. the device as described in any one of claim 9 to 14, it is characterised in that including:
The processing module, is additionally operable to:
If without the URL races with the URL to be detected with identical primary pattern in existing each URL races, storage is described
URL to be detected primary pattern is used as the URL newly added a race;
The URL to be detected is detected.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710108755.7A CN106940711B (en) | 2017-02-27 | 2017-02-27 | URL detection method and detection device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710108755.7A CN106940711B (en) | 2017-02-27 | 2017-02-27 | URL detection method and detection device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106940711A true CN106940711A (en) | 2017-07-11 |
CN106940711B CN106940711B (en) | 2020-02-07 |
Family
ID=59469693
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710108755.7A Active CN106940711B (en) | 2017-02-27 | 2017-02-27 | URL detection method and detection device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106940711B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111898046A (en) * | 2020-07-16 | 2020-11-06 | 北京天空卫士网络安全技术有限公司 | Redirection management method and device |
CN111935133A (en) * | 2020-08-06 | 2020-11-13 | 北京顶象技术有限公司 | White list generation method and device |
CN113839940A (en) * | 2021-09-18 | 2021-12-24 | 北京知道创宇信息技术股份有限公司 | URL pattern tree-based defense method and device, electronic equipment and readable storage medium |
CN114650152A (en) * | 2020-12-17 | 2022-06-21 | 中国科学院计算机网络信息中心 | Method and system for detecting vulnerability of super computing center |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101727447A (en) * | 2008-10-10 | 2010-06-09 | 浙江搜富网络技术有限公司 | Generation method and device of regular expression based on URL |
CN102222187A (en) * | 2011-06-02 | 2011-10-19 | 国家计算机病毒应急处理中心 | Domain name structural feature-based hang horse web page detection method |
CN102739679A (en) * | 2012-06-29 | 2012-10-17 | 东南大学 | URL(Uniform Resource Locator) classification-based phishing website detection method |
CN104699851A (en) * | 2015-04-08 | 2015-06-10 | 上海理想信息产业(集团)有限公司 | Service tag extension method in big data environment |
CN105912573A (en) * | 2016-03-30 | 2016-08-31 | 北京网康科技有限公司 | Data updating method and data updating device |
-
2017
- 2017-02-27 CN CN201710108755.7A patent/CN106940711B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101727447A (en) * | 2008-10-10 | 2010-06-09 | 浙江搜富网络技术有限公司 | Generation method and device of regular expression based on URL |
CN102222187A (en) * | 2011-06-02 | 2011-10-19 | 国家计算机病毒应急处理中心 | Domain name structural feature-based hang horse web page detection method |
CN102739679A (en) * | 2012-06-29 | 2012-10-17 | 东南大学 | URL(Uniform Resource Locator) classification-based phishing website detection method |
CN104699851A (en) * | 2015-04-08 | 2015-06-10 | 上海理想信息产业(集团)有限公司 | Service tag extension method in big data environment |
CN105912573A (en) * | 2016-03-30 | 2016-08-31 | 北京网康科技有限公司 | Data updating method and data updating device |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111898046A (en) * | 2020-07-16 | 2020-11-06 | 北京天空卫士网络安全技术有限公司 | Redirection management method and device |
CN111898046B (en) * | 2020-07-16 | 2024-02-13 | 北京天空卫士网络安全技术有限公司 | Method and device for redirection management |
CN111935133A (en) * | 2020-08-06 | 2020-11-13 | 北京顶象技术有限公司 | White list generation method and device |
CN114650152A (en) * | 2020-12-17 | 2022-06-21 | 中国科学院计算机网络信息中心 | Method and system for detecting vulnerability of super computing center |
CN114650152B (en) * | 2020-12-17 | 2023-06-20 | 中国科学院计算机网络信息中心 | Super computing center vulnerability detection method and system |
CN113839940A (en) * | 2021-09-18 | 2021-12-24 | 北京知道创宇信息技术股份有限公司 | URL pattern tree-based defense method and device, electronic equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN106940711B (en) | 2020-02-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106940711A (en) | A kind of URL detection methods and detection means | |
CN107122340B (en) | A kind of similarity detection method of the science and technology item return based on synonym analysis | |
CN110032737A (en) | A kind of boundary combinations name entity recognition method neural network based | |
US9734147B2 (en) | Clustering repetitive structure of asynchronous web application content | |
CN103399872B (en) | The method and apparatus that webpage capture is optimized | |
CN109005145A (en) | A kind of malice URL detection system and its method extracted based on automated characterization | |
CN107967208A (en) | A kind of Python resource sensitive defect code detection methods based on deep neural network | |
CN104615542B (en) | A kind of method of the fragility association analysis auxiliary bug excavation based on function call | |
CN107423391A (en) | The information extracting method of Web page structural data | |
CN103154884B (en) | Mode detection | |
CN104243315A (en) | Apparatus and Method for Uniquely Enumerating Paths in a Parse Tree | |
CN106095979A (en) | URL merging treatment method and apparatus | |
CN106789912A (en) | Router data plane anomaly detection method based on classification regression tree | |
CN107341399A (en) | Assess the method and device of code file security | |
CN106708952A (en) | Web page clustering method and device | |
CN107092670A (en) | A kind of visual network crawler system and analysis method based on embedded browser | |
CN112052413B (en) | URL fuzzy matching method, device and system | |
CN107066548A (en) | The method that web page interlinkage is extracted in a kind of pair of dimension classification | |
CN105550169A (en) | Method and device for identifying point of interest names based on character length | |
CN104580254B (en) | A kind of fishing website identifying system and method | |
CN112989348A (en) | Attack detection method, model training method, device, server and storage medium | |
CN103870495B (en) | Method and device for extracting information from website | |
CN103927325B (en) | A kind of method and device classified to URL | |
CN106874340A (en) | A kind of web page address sorting technique and device | |
CN109561163A (en) | The generation method and device of uniform resource locator rewriting rule |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder |
Address after: 100089 Beijing city Haidian District Road No. 4 North wa Yitai three storey building Patentee after: NSFOCUS Technologies Group Co.,Ltd. Patentee after: NSFOCUS TECHNOLOGIES Inc. Address before: 100089 Beijing city Haidian District Road No. 4 North wa Yitai three storey building Patentee before: NSFOCUS INFORMATION TECHNOLOGY Co.,Ltd. Patentee before: NSFOCUS TECHNOLOGIES Inc. |
|
CP01 | Change in the name or title of a patent holder |