CN105302815B - The filter method and device of the uniform resource position mark URL of webpage - Google Patents
The filter method and device of the uniform resource position mark URL of webpage Download PDFInfo
- Publication number
- CN105302815B CN105302815B CN201410284750.6A CN201410284750A CN105302815B CN 105302815 B CN105302815 B CN 105302815B CN 201410284750 A CN201410284750 A CN 201410284750A CN 105302815 B CN105302815 B CN 105302815B
- Authority
- CN
- China
- Prior art keywords
- url
- current
- configuration file
- matching
- current url
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a kind of filter method of the uniform resource position mark URL of webpage and devices, wherein closes this method comprises: obtaining set of URL to be processed, wherein set of URL to be processed closes the URL including multiple webpages to be processed;Following filter operation is executed to each URL in set of URL to be processed conjunction, wherein currently executing the URL of following filter operation in set of URL conjunction to be processed is current URL: judging whether current URL is URL to be detected according to the filter identifier in preset configuration file;If URL is URL to be detected, current URL is matched according to the filtered fields in configuration file;If successfully matching to current URL according to filtered fields, current URL is filtered out from set of URL to be processed conjunction.The present invention solve due to the prior art can not filtering spam webpage URL the technical issues of, to realize after the URL for filtering out spam page progress Web security sweep, improve the efficiency of Web security sweep.
Description
Technical field
The present invention relates to computer fields, in particular to a kind of filtering side of the uniform resource position mark URL of webpage
Method and device.
Background technique
When carrying out webpage Web security sweep to CGI(Common gateway interface) (CGI, Common Gateway Interface),
It is generally necessary to collect all CGI as far as possible, and the rubbish page therein is filtered out, improves the efficiency of Web security sweep.Mesh
Before, it mainly includes following two that those skilled in the art, which usually acquire the method for CGI: first is that by web crawlers, in internet
On crawl URL;Second is that obtaining CGI by the flow for bypassing WAF.However, the method for above-mentioned both acquisitions CGI, all can not
What is avoided is collected into many spam pages, wherein and above-mentioned spam page can be webpage cannot accessing or being not present, this
A little spam pages are meaningless to Web security sweep, or even largely affect the efficiency of Web security sweep.
As the quantity of collected CGI is continuously increased, the spam page being collected by above-mentioned CGI acquisition method also with
Increase, in this way, rapidly filtering out spam page, and mistake from the URL of magnanimity during webpage Web security sweep
The corresponding URL of spam page is filtered, is just become particularly significant.
However, being directed to above-mentioned problem, currently no effective solution has been proposed.
Summary of the invention
The embodiment of the invention provides a kind of filter method of the uniform resource position mark URL of webpage and devices, at least
Solve due to the prior art can not filtering spam webpage URL the technical issues of.
According to an aspect of an embodiment of the present invention, a kind of filtering side of the uniform resource position mark URL of webpage is provided
Method, comprising: obtain set of URL to be processed and close, wherein above-mentioned set of URL to be processed closes the URL including multiple webpages to be processed;To upper
State set of URL to be processed close in each URL execute following filter operation, wherein currently executed in above-mentioned set of URL conjunction to be processed with
The URL of lower filter operation is current URL: whether judging above-mentioned current URL according to the filter identifier in preset configuration file
For URL to be detected;If above-mentioned URL is above-mentioned URL to be detected, according to the filtered fields in above-mentioned configuration file to above-mentioned current
URL is matched;If successfully being matched to above-mentioned current URL according to above-mentioned filtered fields, closed from above-mentioned set of URL to be processed
In filter out above-mentioned current URL.
According to another aspect of an embodiment of the present invention, a kind of filtering of the uniform resource position mark URL of webpage is additionally provided
Device, comprising: acquiring unit is closed for obtaining set of URL to be processed, wherein it includes multiple to be processed that above-mentioned set of URL to be processed, which closes,
The URL of webpage;Filter element executes following filter operation for each URL in closing to above-mentioned set of URL to be processed, wherein on
Stating and currently executing the URL of following filter operation in set of URL conjunction to be processed is current URL: according to the mistake in preset configuration file
Filter identifier judges whether above-mentioned current URL is URL to be detected;When above-mentioned URL is above-mentioned URL to be detected, matched according to above-mentioned
The filtered fields set in file match above-mentioned current URL;According to above-mentioned filtered fields successfully to above-mentioned current URL into
When row matching, above-mentioned current URL is filtered out from above-mentioned set of URL conjunction to be processed.
In embodiments of the present invention, by being filtered using to be processed URL of the configuration file to acquisition, wherein above-mentioned
Filter identifier, filtered fields are included at least in configuration file, by whether judging above-mentioned URL to be processed using filter identifier
For URL to be detected, to achieve the purpose that carry out preliminary screening to above-mentioned URL, then by filtered fields to URL to be detected into
Row matching, and then is filtered the URL of successful match, to realize during Web security sweep, no longer to need not
URL corresponding to the spam page wanted is scanned, to realize the efficiency for improving Web security sweep.And then solve by
In the prior art can not filtering spam webpage URL the technical issues of.
In addition, by using the characteristic parameter and/or feature string in filtered fields, to above-mentioned URL to be detected according to
Scheduled matching way is matched, and has achieved the purpose that accurately filtering to URL, to realize the unification improved to webpage
The technical effect of the accuracy of the filtering of Resource Locator URL.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair
Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is according to an embodiment of the present invention a kind of optionally using the filtering side of the uniform resource position mark URL of webpage
The hardware environment schematic diagram of method;
Fig. 2 is a kind of filter method of the uniform resource position mark URL of optional webpage according to an embodiment of the present invention
Flow chart;
Fig. 3 is a kind of method of optional uniform resource position mark URL for obtaining webpage according to an embodiment of the present invention
Flow chart;
Fig. 4 is in a kind of filter method of the uniform resource position mark URL of optional webpage according to an embodiment of the present invention
Configuration file schematic diagram;
Fig. 5 is the filter method of the uniform resource position mark URL of another optional webpage according to an embodiment of the present invention
Flow chart;
Fig. 6 is the filter method of the uniform resource position mark URL of another optional webpage according to an embodiment of the present invention
In configuration file schematic diagram;
Fig. 7 is a kind of filter device of the uniform resource position mark URL of optional webpage according to an embodiment of the present invention
Schematic diagram;And
Fig. 8 is according to an embodiment of the present invention a kind of optionally using the filtering side of the uniform resource position mark URL of webpage
The schematic diagram of the server of method.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention
Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only
The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people
The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work
It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way
Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or
Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover
Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to
Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product
Or other step or units that equipment is intrinsic.
Embodiment 1
According to embodiments of the present invention, a kind of filter method of the uniform resource position mark URL of webpage, above-mentioned webpage are provided
The filter method of uniform resource position mark URL can be applied in hardware environment as shown in Figure 1, wherein for webpage
Uniform resource position mark URL execute the filtering server 102 of filtering and can pass through network and the webpage where above-mentioned webpage takes
Business device 104 establishes the link, and is filtered to the URL to be processed sent by above-mentioned web page server 104.Wherein, above-mentioned net
Network includes but is not limited to: wide area network, Metropolitan Area Network (MAN) or local area network.
Optionally, as shown in Fig. 2, the filter method of the URL of the webpage in the present embodiment includes:
S202 obtains set of URL to be processed and closes, wherein set of URL to be processed closes the URL including multiple webpages to be processed;
S204 executes following filter operation to each URL in set of URL to be processed conjunction, wherein during set of URL to be processed closes
The current URL for executing following filter operation is current URL:
S2042 judges whether current URL is URL to be detected according to the filter identifier in preset configuration file;
S2044 matches current URL according to the filtered fields in configuration file if URL is URL to be detected;
S2046 is filtered out from set of URL to be processed conjunction and is worked as if successfully being matched to current URL according to filtered fields
Preceding URL;
S2048, if URL is not URL to be detected, alternatively, if not carried out successfully to current URL according to filtered fields
Match, does not then filter out current URL from set of URL to be processed conjunction.
Optionally, in the present embodiment, the filter method of the uniform resource position mark URL of above-mentioned webpage can be applied to
During Web security sweep.For example, as shown in connection with fig. 1, before executing to above-mentioned Web security sweep, obtaining above-mentioned wait locate
The set of URL of reason closes, wherein above-mentioned set of URL to be processed closes the URL including multiple webpages to be processed, to every in the conjunction of above-mentioned set of URL
A URL executes filter operation, so as to filter out unnecessary execution Web peace from the URL of magnanimity acquired in filtering server 102
URL corresponding to the spam page of full scan.The example above is a kind of example, and the present embodiment does not do any restriction to this.
Optionally, in the present embodiment, as shown in connection with fig. 3, before obtaining set of URL to be processed and closing, filtering server
Interactive process between 102 and web page server 104:
S302, filtering server 102 can be sent to web page server 104 by network and obtain what set of URL to be processed closed
Request;
S304, set of URL to be processed can be returned to filtering server 102 by responding the above-mentioned web page server 104 of above-mentioned request
It closes.
Optionally, in the present embodiment, above-mentioned configuration file is by the json word including filter identifier and filtered fields
The file that symbol string is formed, wherein json is a kind of data interchange language JavaScript Object Notation of lightweight,
Above-mentioned language based on text is easy to that people is allowed to read, while also facilitating machine and being parsed and generated.Wherein, above-mentioned
Filter identifier can include but is not limited to: the scope of application for executing filtering is closed to above-mentioned set of URL to be processed.For example, above-mentioned suitable
It can include but is not limited to range: global webpage, local webpage, wherein above-mentioned part webpage can be by presetting domain name
Mode is screened.Above-mentioned filtered fields can include but is not limited to: indicate the matching that filtering is executed to above-mentioned URL to be detected
As a result, wherein can include but is not limited to multiple filtering subfields in above-mentioned filtered fields.For example, above-mentioned matching result can be with
Including but not limited to: for matched characteristic parameter and its matching way, being used for matched feature string and its matching way.
For example, above-mentioned filter identifier is identified with " host ", when the value of above-mentioned " host " is as shown in 402 in Fig. 4
" * ", then it represents that above-mentioned filtering is suitable for the filtering to all webpages;When the value of above-mentioned " host " is " domain name/IP ", then it represents that
Above-mentioned filtering is suitable for the webpage corresponding to above-mentioned " domain name/IP ".As the current URL for judging above-mentioned current execution filter operation
Meet above-mentioned filter identifier, then judges that above-mentioned current URL is URL to be detected.
In another example as shown in 404 in Fig. 4, above-mentioned filtered fields are identified with " rule ", wherein can be in above-mentioned " rule "
Subfield including but not limited to as follows: the characteristic parameter of status code " HttpCode " 1) is set;2) message text is set
The feature string of " Content ".For example, the value of configuration file configuration status code " HttpCode " " is equal to " numerical value " 200 ",
Configure message text " Content " character string be " http://qzone.qq.com/gy/404/data.js ", when it is above-mentioned to
Detect the equal successful match of all subfields in URL and above-mentioned filtered fields, then may determine that above-mentioned URL matching to be detected at
Function filters out above-mentioned current URL from above-mentioned set of URL conjunction to be processed.
Optionally, in the present embodiment, above-mentioned configuration file can also include but is not limited to: the type name of configuration file
Claim, the attribute of configuration file, wherein the attribute of above-mentioned configuration file can include but is not limited to: the addition time of configuration file,
The adder of configuration file.For example, the typonym of configuration file is " gongyi404 ", in Fig. 4 as shown in 406 in Fig. 4
Shown in 408, the addition time of configuration file is " 2013-10-13 ", and the adder of configuration file is " zhangsan ".
Optionally, in the present embodiment, after having executed filtering to above-mentioned URL to be processed, it will filter out spam page
Corresponding URL is saved, and is scanned in order to call when Web security sweep, and the efficiency for improving Web security sweep is reached.
Optionally, in the present embodiment, above-mentioned configuration file can be saved in the form of Hash table, and the position of preservation can be with
For at least one of: in disk text file, in the file of database server.Optionally, when needing to above-mentioned to be processed
URL when executing filtering, just by above-mentioned position load above-mentioned configuration file realize to the URL in above-mentioned set of URL conjunction to be processed into
Row filtering.Optionally, in the present embodiment, the mode for loading above-mentioned configuration file can be, but not limited to as in above-mentioned Hash table
Traversal searches configuration file corresponding with current URL.
Optionally, above-mentioned preset configuration file is multiple configuration files, wherein is executed by following steps according to default
Configuration file in filter identifier judge whether current URL is URL to be detected, according to the filtered fields pair in configuration file
Current URL is matched, is filtered out current URL from set of URL to be processed conjunction: matching configuration text is searched from multiple configuration files
Part, wherein judge that current URL is URL to be detected and configures according to matching according to the filter identifier in matching configuration file
Filtered fields in file successfully match current URL;As long as finding out a matching configuration from multiple configuration files
File then filters out current URL from set of URL to be processed conjunction.
Specifically flow chart referring to figure 1 and figure 5, illustrates the filtering stream of the uniform resource position mark URL of above-mentioned webpage
Journey, it is assumed that set of URL close in include URL_1, URL_2, URL_3, URL_4, URL_5, multiple configuration files be respectively P_1, P_2,
P_3。
As shown in Figure 1 and Figure 5, the filter method of the URL of the webpage in the present embodiment includes:
S502, filtering server 102 are closed to 104 request of web page server set of URL to be processed;
S504, web page server 104 return to set of URL to be processed to filtering server 102 and close;
S506, filtering server 102 judge whether to execute filter operation to all URL, wherein during above-mentioned set of URL closes
Including URL_1, URL_2, URL_3, if judging, there are also URL to be not carried out filter operation in above-mentioned set of URL conjunction, is thened follow the steps
S508;If judging, the URL in above-mentioned set of URL conjunction had executed filter operation, terminated this filtering;
S508, filtering server 102 read a URL (for example, reading URL_3), wherein above-mentioned URL is (for example, URL_
3) also it is not carried out filter operation;
S510, the lookup of filtering server 102 judges whether to perform matching operation to all configuration files, if judging
There are also configuration files to be not carried out matching operation, thens follow the steps S512, if judging to perform matching to all configuration files
It operates (S514, alternatively, S514 and S516), then returns to S506;Wherein, it is right to judge whether in 102 first time of filtering server
When all configuration files have matched, filtering server 102 loads all configuration files (for example, configuration file P_1, P_ first
2,P_3);
S512, filtering server 102 read a configuration file (for example, configuration file P_2), wherein above-mentioned to be read
Configuration file (for example, configuration file P_2) be also not carried out matching operation;
S514, filtering server 102 judge whether current URL meets matching result indicated by filter identifier, if full
Foot, thens follow the steps S516, otherwise, returns to lookup again and judges new configuration file;
S516, filtering server 102 successively match current URL (for example, URL_3) with the subfield in filtered fields;
S518, filtering server 102 judges whether successful match, if successful match, judges above-mentioned current URL (example
Such as, URL_3) it is the corresponding URL of spam page, step S520 is executed, if failed matching, returns to S510, judge whether
Matching operation (for example, configuration file P_3 has not carried out matching operation) is performed to all configuration files;
S520, filtering server 102 will filter out the corresponding URL of above-mentioned spam page, and return to step S506, with
Judge whether to execute filter operations to all URL in set of URL conjunction.
By the step S518 in the above process it is found that current URL (for example, URL_3) be currently executing matched match
It sets file P_2 to match not successfully, then returns to lookup again and judge new configuration file, such as read again from multiple configuration files
One is also not carried out the configuration file of matching operation, for example, the configuration file is configuration file P_3.If current URL (for example,
URL_3) be currently executing matched for configuration file P_2 successful match, i.e. configuration file P_2 is that the matching found out is matched
File is set, S520 is thened follow the steps, is judged out current URL corresponding to the webpage for spam page (for example, URL_ for above-mentioned
3) it filters out.Then, S506 is returned to step, judges whether to execute filter operations to all URL in set of URL conjunction.
The embodiment provided through the invention, by the filter identifier in preset configuration file from the to be processed of magnanimity
Set of URL filters out URL to be detected in closing, further, using the filtered fields in configuration file to carrying out in above-mentioned URL to be detected
Matching to obtain needing the URL filtered out in closing from above-mentioned set of URL to be processed, and then is realized in the set of URL conjunction got
URL corresponding to unnecessary spam page is effectively filtered, thus during Web security sweep, without to rubbish
URL corresponding to webpage is scanned, and achievees the effect that the efficiency for improving Web security sweep.
As a kind of optional scheme, step S2042 judges current according to the filter identifier in preset configuration file
Whether URL is that URL to be detected includes:
If 1) filter identifier is to be used to indicate the field being filtered to all webpages, judge current URL be to
Detect URL;Or
If 2) filter identifier is to be used to indicate the field being filtered to default domain name, judge in current URL whether
Including presetting domain name, if in current URL including default domain name, judge that current URL is URL to be detected.
It is specifically illustrated in conjunction with following example, as shown in 402 in Fig. 4, identifies above-mentioned filter identifier with " host ", when
The value of above-mentioned " host " is " * ", then it represents that above-mentioned filtering is suitable for the filtering to all webpages, then judges above-mentioned all nets
Page corresponding to URL be URL to be detected, above-mentioned URL to be detected by be used for execute after matching judgment.
It is specifically illustrated in conjunction with following example, configuration file as shown in Figure 6, when the value of above-mentioned " host " 602
For " www.sina.com/168.1.1.3 ", then it represents that above-mentioned filtering corresponding webpage suitable for above-mentioned Sina website.Work as judgement
The above-mentioned current current URL for executing filter operation includes above-mentioned preset domain name out, then judge above-mentioned current URL be it is described to
Detect URL.
Optionally, in the present embodiment, if above-mentioned current URL is unsatisfactory for the range of filter identifier instruction, illustrate
Stating current URL not is the corresponding URL of spam page, then does not continue to the judgement for executing filter operation.
The embodiment provided through the invention screens above-mentioned set of URL conjunction to be processed by filter identifier, with
The URL to be detected in corresponding range is obtained, and then within the above range falls the corresponding url filtering of spam page, is realized pair
URL in preset range executes filter operation, is improved with reaching to url filtering accuracy.
As a kind of optional scheme, step S2044, according to the filtered fields in configuration file to current URL progress
With including:
S1 executes the matching operation indicated in filtered fields to current URL;
S2 judges whether according to whether the result that execution matching operation obtains meets the matching result indicated in filtered fields
Success matches current URL.
Optionally, in the present embodiment, the matching operation indicated in above-mentioned filtered fields includes but is not limited to: characteristic parameter
Matching, characteristic character String matching.Wherein, features described above parameter can include but is not limited at least one of: current URL institute is right
The status code for the webpage answered, and/or the Content length field of the size for indicating webpage corresponding to current URL.Above-mentioned spy
Sign character string can include but is not limited at least one of: the partial character string in the currently link of URL, current URL link.
Optionally, in the present embodiment, can be set in above-mentioned filtered fields characteristic parameter and/or feature string and
Its corresponding matching way executes matching operation to the URL to be detected to realize.Optionally, in the present embodiment, above-mentioned spy
The matching way of sign parameter can include but is not limited to are as follows: greater than the value of features described above parameter, less than features described above parameter
Value, equal to the value of features described above parameter.In the present embodiment, the matching way of features described above character string may include but not
It is limited to are as follows: search matching, canonical matching.
For example, the matching result progress that will be indicated in the current URL for being used to execute filter operation and above-mentioned filtered fields
Match, if current URL, which executes the result that matching operation obtains, meets the matching result indicated in filtered fields, it is right to judge successfully
Current URL is matched.
The embodiment provided through the invention, by executing matching operation to URL to be detected using filtered fields, further
Judge whether above-mentioned URL to be detected is the corresponding URL of unnecessary spam page, realize URL corresponding to spam page into
Row accurately judgement reduces the cost of Web security sweep to realize the efficiency for improving Web security sweep.
As a kind of optional scheme, S1, executing the matching operation indicated in filtered fields to current URL includes: S10,
Judge whether meet the matched in configuration file between the characteristic parameter in characteristic parameter and filtered fields in current URL,
Wherein, characteristic parameter includes: the status code of webpage corresponding to current URL, and/or for indicating net corresponding to current URL
The Content length field of the size of page;Whether S2 indicates in filtered fields according to executing matching operation obtained result and meeting
Matching result judges whether that successfully carrying out matching to current URL includes: S20, if characteristic parameter and filtered fields in current URL
In characteristic parameter between when meeting the matched in configuration file, then judge successfully to match current URL.
It is specifically illustrated in conjunction with following example, as shown in connection with fig. 6, it is assumed that current URL is executed in filtered fields and is indicated
Matching operation be to be matched to characteristic parameter, then judge characteristic parameter in current URL whether with the spy in filtered fields
Sign parameter meets scheduled matched.For example, the characteristic parameter in above-mentioned filtered fields is status code " HttpCode ", it is scheduled
Matched be "=, 200 ", then judge whether characteristic parameter (that is, status code " HttpCode ") in current URL is equal to 200, if
Judge that the status code " HttpCode " in current URL meets above-mentioned matched, then judges successfully to carry out above-mentioned current URL
Matching.
As a kind of optional scheme, S1, executing the matching operation indicated in filtered fields to current URL includes: S10,
Judge whether meet the matching in configuration file between the feature string in feature string and filtered fields in current URL
Condition, wherein feature string includes at least one of: the partial character string in the currently link of URL, current URL link;
S2 judges whether successfully according to executing the obtained result of matching operation and whether meeting the matching result indicated in filtered fields to working as
It includes: S20 that preceding URL, which carries out matching, if meeting configuration between the character string in the feature string and filtered fields in current URL
When matching condition in file, then judge successfully to match current URL.
It is specifically illustrated in conjunction with following example, as shown in connection with fig. 6, it is assumed that current URL is executed in filtered fields and is indicated
Matching operation be to be matched to feature string, then judge feature string in current URL whether in filtered fields
Feature string meet scheduled matching condition.For example, the feature string in above-mentioned filtered fields is message text
" Content ", scheduled matching condition be " substr=, stc=" http://news.sina.com/gj/303/
Data.js " ", then judge whether the feature string (that is, message text " Content ") in current URL meets above-mentioned matching item
The concatenation character string of world news in part, such as instruction Sina News shown in fig. 6, if being found in current URL above-mentioned
Feature string is then judged successfully to match above-mentioned current URL.
In another example setting canonical matched, using the partial character string in complete character string as feature string, using just
Whether then matched mode judges in above-mentioned current URL to include feature string set in canonical matched, with realization pair
URL comprising certain specific partial character string is filtered, so that the filtering in the present embodiment to current URL, it can needle
URL comprising certain specific a kind of character string is filtered.
The embodiment provided through the invention, by by current URL characteristic parameter and/or feature string and filtering
Set characteristic parameter and/or feature string are matched according to scheduled matching way in field, are realized and are accurately sentenced
The corresponding URL of spam page in disconnected above-mentioned URL to be detected out, so that the corresponding URL of above-mentioned spam page is carried out accurate mistake
Filter, and then Web security sweep is carried out to filtered URL, achieve the effect that the efficiency for improving Web security sweep.
As a kind of optional scheme, in step S206, filter operation is executed to each URL in set of URL to be processed conjunction
Later, further includes:
S1 holds webpage to be processed indicated by each URL in the set of URL to be processed conjunction as having filtered out current URL
Row safe web page scan operation.
It is specifically illustrated in conjunction with following example, after having filtered out the corresponding URL of spam page, by above-mentioned set of URL
Remaining URL is saved after filtering out the corresponding URL of spam page in conjunction, when executing safe web page scan operation, directly
Call the above-mentioned URL without spam page saved.
The embodiment provided through the invention, by being closed to the set of URL to be processed for having filtered out the corresponding URL of spam page
In each URL indicated by webpage to be processed execute safe web page scan operation, avoid to realize to spam page pair
The URL answered executes safe web page scan operation, has achieved the effect that improve Web security sweep.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because
According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know
It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention
It is necessary.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation
The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much
In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing
The part that technology contributes can be embodied in the form of software products, which is stored in a storage
In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate
Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
Embodiment 2
According to embodiments of the present invention, a kind of filter device of the uniform resource position mark URL of webpage, above-mentioned webpage are provided
The filter device of uniform resource position mark URL can be applied in hardware environment as shown in Figure 1, wherein above-mentioned apparatus position
In the filtering server 102 for executing filtering for the uniform resource position mark URL to webpage, filtering server 102 can lead to
Network is crossed to establish the link with the web page server 104 where above-mentioned webpage, and to by above-mentioned web page server 104 send to
The URL of reason is filtered.Wherein, above-mentioned network includes but is not limited to: wide area network, Metropolitan Area Network (MAN) or local area network.
According to embodiments of the present invention, a kind of filter device of the uniform resource position mark URL of webpage, such as Fig. 7 are additionally provided
Shown, which includes:
1) it is closed for obtaining set of URL to be processed acquiring unit 702, wherein it includes multiple to be processed that set of URL to be processed, which closes,
The URL of webpage;
2) filter element 704 execute following filter operation for each URL in closing to set of URL to be processed, wherein to
Handling and currently executing the URL of following filter operation in set of URL conjunction is current URL:
I) judge whether current URL is URL to be detected according to the filter identifier in preset configuration file;
Ii when) URL is URL to be detected, current URL is matched according to the filtered fields in configuration file;
When iii) successfully being matched to current URL according to filtered fields, filtered out from set of URL to be processed conjunction current
URL。
Optionally, in the present embodiment, the filter method of the uniform resource position mark URL of above-mentioned webpage can be applied to
During Web security sweep.For example, as shown in connection with fig. 1, before executing to above-mentioned Web security sweep, obtaining above-mentioned wait locate
The set of URL of reason closes, wherein above-mentioned set of URL to be processed closes the URL including multiple webpages to be processed, to every in the conjunction of above-mentioned set of URL
A URL executes filter operation, so as to filter out unnecessary execution Web peace from the URL of magnanimity acquired in filtering server 102
URL corresponding to the spam page of full scan.The example above is a kind of example, and the present embodiment does not do any restriction to this.
Optionally, in the present embodiment, as shown in connection with fig. 3, before obtaining set of URL to be processed and closing, filtering server
Interactive process between 102 and web page server 104:
S302, filtering server 102 can be sent to web page server 104 by network and obtain what set of URL to be processed closed
Request;
S304, set of URL to be processed can be returned to filtering server 102 by responding the above-mentioned web page server 104 of above-mentioned request
It closes.
Optionally, in the present embodiment, above-mentioned configuration file is by the json word including filter identifier and filtered fields
The file that symbol string is formed, wherein json is a kind of data interchange language JavaScript Object Notation of lightweight,
Above-mentioned language based on text is easy to that people is allowed to read, while also facilitating machine and being parsed and generated.Wherein, above-mentioned
Filter identifier can include but is not limited to: the scope of application for executing filtering is closed to above-mentioned set of URL to be processed.For example, above-mentioned suitable
It can include but is not limited to range: global webpage, local webpage, wherein above-mentioned part webpage can be by presetting domain name
Mode is screened.Above-mentioned filtered fields can include but is not limited to: indicate the matching that filtering is executed to above-mentioned URL to be detected
As a result, wherein can include but is not limited to multiple filtering subfields in above-mentioned filtered fields.For example, above-mentioned matching result can be with
Including but not limited to: for matched characteristic parameter and its matching way, being used for matched feature string and its matching way.
For example, above-mentioned filter identifier is identified with " host ", when the value of above-mentioned " host " is as shown in 402 in Fig. 4
" * ", then it represents that above-mentioned filtering is suitable for the filtering to all webpages;When the value of above-mentioned " host " is " domain name/IP ", then it represents that
Above-mentioned filtering is suitable for the webpage corresponding to above-mentioned " domain name/IP ".As the current URL for judging above-mentioned current execution filter operation
Meet above-mentioned filter identifier, then judges that above-mentioned current URL is URL to be detected.
In another example as shown in 404 in Fig. 4, above-mentioned filtered fields are identified with " rule ", wherein can be in above-mentioned " rule "
Subfield including but not limited to as follows: the characteristic parameter of status code " HttpCode " 1) is set;2) message text is set
The feature string of " Content ".For example, the value of configuration file configuration status code " HttpCode " " is equal to " numerical value " 200 ",
Configure message text " Content " character string be " http://qzone.qq.com/gy/404/data.js ", when it is above-mentioned to
Detect the equal successful match of all subfields in URL and above-mentioned filtered fields, then may determine that above-mentioned URL matching to be detected at
Function filters out above-mentioned current URL from above-mentioned set of URL conjunction to be processed.
Optionally, in the present embodiment, above-mentioned configuration file can also include but is not limited to: the type name of configuration file
Claim, the attribute of configuration file, wherein the attribute of above-mentioned configuration file can include but is not limited to: the addition time of configuration file,
The adder of configuration file.For example, the typonym of configuration file is " gongyi404 ", in Fig. 4 as shown in 406 in Fig. 4
Shown in 408, the addition time of configuration file is " 2013-10-13 ", and the adder of configuration file is " zhangsan ".
Optionally, in the present embodiment, after having executed filtering to above-mentioned URL to be processed, it will filter out spam page
Corresponding URL is saved, and is scanned in order to call when Web security sweep, and the efficiency for improving Web security sweep is reached.
Optionally, in the present embodiment, above-mentioned configuration file can be saved in the form of Hash table, and the position of preservation can be with
For at least one of: in disk text file, in the file of database server.Optionally, when needing to above-mentioned to be processed
URL when executing filtering, just by above-mentioned position load above-mentioned configuration file realize to the URL in above-mentioned set of URL conjunction to be processed into
Row filtering.Optionally, in the present embodiment, the mode for loading above-mentioned configuration file can be, but not limited to as in above-mentioned Hash table
Traversal searches configuration file corresponding with current URL.
Optionally, above-mentioned preset configuration file is multiple configuration files, wherein the filter device of the URL of above-mentioned webpage
It is executed by following steps and judges whether current URL is URL to be detected, root according to the filter identifier in preset configuration file
Current URL is matched according to the filtered fields in configuration file, filters out current URL from set of URL to be processed conjunction: from multiple
Matching configuration file is searched in configuration file, wherein judge that current URL is according to the filter identifier in matching configuration file
URL to be detected and according to matching configuration file in filtered fields successfully current URL is matched;As long as from multiple configurations
A matching configuration file is found out in file, then filters out current URL from set of URL to be processed conjunction.
Specifically flow chart referring to figure 1 and figure 5, illustrates the filtering stream of the uniform resource position mark URL of above-mentioned webpage
Journey, it is assumed that set of URL close in include URL_1, URL_2, URL_3, URL-4, URL_5, multiple configuration files be respectively P_1, P_2,
P_3。
S502, filtering server 102 are closed to 104 request of web page server set of URL to be processed;
S504, web page server 104 return to set of URL to be processed to filtering server 102 and close;
S506, filtering server 102 judge whether to execute filter operation to all URL, wherein during above-mentioned set of URL closes
Including URL_1, URL_2, URL_3, if judging, there are also URL to be not carried out filter operation in above-mentioned set of URL conjunction, is thened follow the steps
S508;If judging, the URL in above-mentioned set of URL conjunction had executed filter operation, terminated this filtering;
S508, filtering server 102 read a URL (for example, reading URL_3), wherein above-mentioned URL is (for example, URL_
3) also it is not carried out filter operation;
S510, the lookup of filtering server 102 judges whether to perform matching operation to all configuration files, if judging
There are also configuration files to be not carried out matching operation, thens follow the steps S512, if judging to perform matching to all configuration files
It operates (S514, alternatively, S514 and S516), then returns to S506;Wherein, it is right to judge whether in 102 first time of filtering server
When all configuration files have matched, filtering server 102 loads all configuration files (for example, configuration file P_1, P_ first
2,P_3);
S512, filtering server 102 read a configuration file (for example, configuration file P_2), wherein above-mentioned to be read
Configuration file (for example, configuration file P_2) be also not carried out matching operation;
S514, filtering server 102 judge whether current URL meets matching result indicated by filter identifier, if full
Foot, thens follow the steps S516, otherwise, returns to lookup again and judges new configuration file;
S516, filtering server 102 successively match current URL (for example, URL_3) with the subfield in filtered fields;
S518, filtering server 102 judges whether successful match, if successful match, judges above-mentioned current URL (example
Such as, URL_3) it is the corresponding URL of spam page, step S520 is executed, if failed matching, returns to S510, judge whether
Matching operation (for example, configuration file P_3 has not carried out matching operation) is performed to all configuration files;
S520, filtering server 102 will filter out the corresponding URL of above-mentioned spam page, and return to step S506, sentence
It is disconnected whether filter operations to have been executed to all URL in set of URL conjunction.
The embodiment provided through the invention, by the filter identifier in preset configuration file from the to be processed of magnanimity
Set of URL filters out URL to be detected in closing, further, using the filtered fields in configuration file to carrying out in above-mentioned URL to be detected
Matching to obtain needing the URL filtered out in closing from above-mentioned set of URL to be processed, and then is realized in the set of URL conjunction got
URL corresponding to unnecessary spam page is effectively filtered, thus during Web security sweep, without to rubbish
URL corresponding to webpage is scanned, and achievees the effect that the efficiency for improving Web security sweep.
As a kind of optional scheme, above-mentioned filter element 704 includes:
1) first judgment module is sentenced for being to be used to indicate the field being filtered to all webpages in filter identifier
Disconnected current URL out is URL to be detected;Or
2) the second judgment module, for being to be used to indicate the field being filtered to default domain name in filter identifier, then
Judge in current URL whether to include default domain name, if in current URL including default domain name, judges that current URL is to be detected
URL。
It is specifically illustrated in conjunction with following example, as shown in 402 in Fig. 4, identifies above-mentioned filter identifier with " host ", when
The value of above-mentioned " host " is " * ", then it represents that above-mentioned filtering is suitable for the filtering to all webpages, then judges above-mentioned all nets
Page corresponding to URL be URL to be detected, above-mentioned URL to be detected by be used for execute after matching judgment.
It is specifically illustrated in conjunction with following example, configuration file as shown in Figure 6, when the value of above-mentioned " host " 602
For " www.sina.com/168.1.1.3 ", then it represents that above-mentioned filtering corresponding webpage suitable for above-mentioned Sina website.Work as judgement
The above-mentioned current current URL for executing filter operation includes above-mentioned preset domain name out, then judge above-mentioned current URL be it is described to
Detect URL.
Optionally, in the present embodiment, if above-mentioned current URL is unsatisfactory for the range of filter identifier instruction, illustrate
Stating current URL not is the corresponding URL of spam page, then does not continue to the judgement for executing filter operation.
The embodiment provided through the invention screens above-mentioned set of URL conjunction to be processed by filter identifier, with
The URL to be detected in corresponding range is obtained, and then within the above range falls the corresponding url filtering of spam page, is realized pair
URL in preset range executes filter operation, is improved with reaching to url filtering accuracy.
As a kind of optional scheme, above-mentioned filter element 704 includes:
1) matching module, for executing the matching operation indicated in filtered fields to current URL;
2) third judgment module, for being indicated in filtered fields according to executing matching operation obtained result and whether meet
Matching result judges whether successfully to match current URL.
Optionally, in the present embodiment, the matching operation indicated in above-mentioned filtered fields includes but is not limited to: characteristic parameter
Matching, characteristic character String matching.Wherein, features described above parameter can include but is not limited at least one of: current URL institute is right
The status code for the webpage answered, and/or the Content length field of the size for indicating webpage corresponding to current URL.Above-mentioned spy
Sign character string can include but is not limited at least one of: the partial character string in the currently link of URL, current URL link.
Optionally, in the present embodiment, can be set in above-mentioned filtered fields characteristic parameter and/or feature string and
Its corresponding matching way executes matching operation to the URL to be detected to realize.Optionally, in the present embodiment, above-mentioned spy
The matching way of sign parameter can include but is not limited to are as follows: greater than the value of features described above parameter, less than features described above parameter
Value, equal to the value of features described above parameter.In the present embodiment, the matching way of features described above character string may include but not
It is limited to are as follows: search matching, canonical matching.
For example, the matching result progress that will be indicated in the current URL for being used to execute filter operation and above-mentioned filtered fields
Match, if current URL, which executes the result that matching operation obtains, meets the matching result indicated in filtered fields, it is right to judge successfully
Current URL is matched.
The embodiment provided through the invention, by executing matching operation to URL to be detected using filtered fields, further
Judge whether above-mentioned URL to be detected is the corresponding URL of unnecessary spam page, realize URL corresponding to spam page into
Row accurately judgement reduces the cost of Web security sweep to realize the efficiency for improving Web security sweep.
As a kind of optional scheme, above-mentioned matching module executed current URL with realizing by executing following steps
The matching operation indicated in filter field includes: between the characteristic parameter in the characteristic parameter and filtered fields judged in current URL
Whether matched in configuration file is met, wherein characteristic parameter includes: the status code of webpage corresponding to current URL, and/
Or the Content length field of the size for indicating webpage corresponding to current URL;Above-mentioned third judgment module by execute with
Lower step is according to executing the obtained result of matching operation and whether meet the matching result indicated in filtered fields judgement to realize
If it is no successfully to current URL carry out matching include: between characteristic parameter in characteristic parameter and filtered fields in current URL it is full
When matched in sufficient configuration file, then judge successfully to match current URL.
It is specifically illustrated in conjunction with following example, as shown in connection with fig. 6, it is assumed that current URL is executed in filtered fields and is indicated
Matching operation be to be matched to characteristic parameter, then judge characteristic parameter in current URL whether with the spy in filtered fields
Sign parameter meets scheduled matched.For example, the characteristic parameter in above-mentioned filtered fields is status code " HttpCode ", it is scheduled
Matched be "=, 200 ", then judge whether characteristic parameter (that is, status code " HttpCode ") in current URL is equal to 200, if
Judge that the status code " HttpCode " in current URL meets above-mentioned matched, then judges successfully to carry out above-mentioned current URL
Matching.
As a kind of optional scheme, above-mentioned matching module executed current URL with realizing by executing following steps
The matching operation indicated in filter field includes: the feature string in the feature string and filtered fields judged in current URL
Between whether meet matching condition in configuration file, wherein feature string includes at least one of: the chain of current URL
It connects, the partial character string in current URL link;Above-mentioned third judgment module is by executing following steps to realize according to execution
Judge whether successfully to match current URL with whether the obtained result of operation meets the matching result indicated in filtered fields
If including: the matching condition met in configuration file between the character string in the feature string and filtered fields in current URL
When, then judge successfully to match current URL.
It is specifically illustrated in conjunction with following example, as shown in connection with fig. 6, it is assumed that current URL is executed in filtered fields and is indicated
Matching operation be to be matched to feature string, then judge feature string in current URL whether in filtered fields
Feature string meet scheduled matching condition.For example, the feature string in above-mentioned filtered fields is message text
" Content ", scheduled matching condition be " substr=, stc=" http://news.sina.com/gj/303/
Data.js " ", then judge whether the feature string (that is, message text " Content ") in current URL meets above-mentioned matching item
The concatenation character string of world news in part, such as instruction Sina News shown in fig. 6, if being found in current URL above-mentioned
Feature string is then judged successfully to match above-mentioned current URL.
In another example setting canonical matched, using the partial character string in complete character string as feature string, using just
Whether then matched mode judges in above-mentioned current URL to include feature string set in canonical matched, with realization pair
URL comprising certain specific partial character string is filtered, so that the filtering in the present embodiment to current URL, it can needle
URL comprising certain specific a kind of character string is filtered.
The embodiment provided through the invention, by by current URL characteristic parameter and/or feature string and filtering
Set characteristic parameter and/or feature string are matched according to scheduled matching way in field, are realized and are accurately sentenced
The corresponding URL of spam page in disconnected above-mentioned URL to be detected out, so that the corresponding URL of above-mentioned spam page is carried out accurate mistake
Filter, and then Web security sweep is carried out to filtered URL, achieve the effect that the efficiency for improving Web security sweep.
As a kind of optional scheme, preset configuration file is multiple configuration files, wherein filter element 704 includes:
1) searching module, for searching matching configuration file from multiple configuration files, wherein according to matching configuration file
In filter identifier judge that current URL is URL to be detected and according to the filtered fields in matching configuration file successfully to working as
Preceding URL is matched;
2) filtering module, as long as finding out a matching configuration file from multiple configuration files, to be processed
Set of URL filters out current URL in closing.
It is specifically illustrated as shown in connection with fig. 5, by the step S518 in the above process it is found that current URL is (for example, URL_
3) be currently executing matched configuration file P_2 and match not successfully, then return to search again and judge new configuration file, it is logical
Above-mentioned searching module is crossed to search again from multiple configuration files and read the configuration file for being also not carried out matching operation, example
Such as, which is configuration file P_3.If current URL (for example, URL_3) and being currently executing matched for configuration text
Part P_2 successful match, i.e. configuration file P_2 are the matching configuration file that searching module is found out, then pass through above-mentioned filtering module
Step S520 is executed, is that current URL (for example, URL_3) corresponding to the webpage of spam page filters out by above-mentioned be judged out.
As a kind of optional scheme, above-mentioned apparatus further include:
1) scanning element, after executing filter operation for each URL in closing to set of URL to be processed, to by being already expired
Filter webpage execution safe web page scan operation to be processed indicated by each URL in the set of URL to be processed conjunction of current URL.
It is specifically illustrated in conjunction with following example, after having filtered out the corresponding URL of spam page, by above-mentioned set of URL
Remaining URL is saved after filtering out the corresponding URL of spam page in conjunction, when executing safe web page scan operation, directly
Call the above-mentioned URL without spam page saved.
The embodiment provided through the invention, by being closed to the set of URL to be processed for having filtered out the corresponding URL of spam page
In each URL indicated by webpage to be processed execute safe web page scan operation, avoid to realize to spam page pair
The URL answered executes safe web page scan operation, has achieved the effect that improve Web security sweep.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Embodiment 3
According to embodiments of the present invention, it additionally provides a kind of for implementing the mistake of the uniform resource position mark URL of above-mentioned webpage
The server of filter, as shown in figure 8, the server includes:
1) memory 802, are arranged to store and above-mentioned are filtered for executing to the uniform resource position mark URL of webpage
Configuration file and complete filtered URL.
Optionally, in the present embodiment, the content stored in above-mentioned memory 802 can also be from except filtering server 102
Except other servers obtain, the present embodiment do not do any restriction to this.
Optionally, in the present embodiment, above-mentioned memory 802 can be also used for filtered in storage above-described embodiment 1
Other data stored in journey.
2) processor 804, each mould being arranged in the filter device to the uniform resource position mark URL of above-mentioned webpage
Block executes following operation;
S1 obtains set of URL to be processed and closes, wherein set of URL to be processed closes the URL including multiple webpages to be processed;
S2, to set of URL to be processed close in each URL execute following filter operation, wherein in set of URL conjunction to be processed when
The URL of filter operation below preceding execution is current URL:
S20 judges whether current URL is URL to be detected according to the filter identifier in preset configuration file;
S22 matches current URL according to the filtered fields in configuration file if URL is URL to be detected;
S24, if successfully being matched to current URL according to filtered fields, to be processed
Set of URL filters out current URL in closing.
Optionally, in the present embodiment, above-mentioned processor 804 of depositing is also configured to execute following operation to realize according to pre-
If configuration file in filter identifier judge whether current URL is URL to be detected:
S1, if filter identifier is to be used to indicate the field being filtered to all webpages, judge current URL be to
Detect URL;Or
S2, if filter identifier is to be used to indicate the field being filtered to default domain name, judge in current URL whether
Including presetting domain name, if in current URL including default domain name, judge that current URL is URL to be detected.
Optionally, in the present embodiment, above-mentioned processor 804 of depositing is also configured to execute following operation to realize that basis is matched
The filtered fields set in file match current URL:
S1 executes the matching operation indicated in filtered fields to current URL;
S2 judges whether according to whether the result that execution matching operation obtains meets the matching result indicated in filtered fields
Success matches current URL.
3) communication interface 806 are arranged to carry out data interaction with above-mentioned web page server 104.
Optionally, in the present embodiment, above-mentioned communication interface 806 is also configured to the uniform resource locator with webpage
Other servers in addition to above-mentioned web page server 104 carry out data interaction during URL is filtered.
Optionally, the specific example in the present embodiment can be shown with reference to described in above-described embodiment 1 and embodiment 2
Example, details are not described herein for the present embodiment.
Embodiment 4
According to embodiments of the present invention, a kind of storage medium is provided, above-mentioned storage medium can be applied to as shown in Figure 1
In hardware environment.Optionally, above-mentioned storage medium can be, but not limited to be located at and hold for the uniform resource position mark URL to webpage
In the filtering server 102 of row filtering.
Optionally, in the present embodiment, above-mentioned storage medium can be applied to the mistake of the uniform resource position mark URL of webpage
In filter.
Optionally, in the present embodiment, storage medium is arranged to store the program code for executing following steps:
S1 obtains set of URL to be processed and closes, wherein set of URL to be processed closes the URL including multiple webpages to be processed;
S2, to set of URL to be processed close in each URL execute following filter operation, wherein in set of URL conjunction to be processed when
The URL of filter operation below preceding execution is current URL:
S20 judges whether current URL is URL to be detected according to the filter identifier in preset configuration file;
S22 matches current URL according to the filtered fields in configuration file if URL is URL to be detected;
S24 is filtered out current if successfully being matched to current URL according to filtered fields from set of URL to be processed conjunction
URL。
Optionally, storage medium is also configured to storage for executing following steps to realize according to preset configuration file
In filter identifier judge current URL whether the program code for being URL to be detected:
S1, if filter identifier is to be used to indicate the field being filtered to all webpages, judge current URL be to
Detect URL;Or
S2, if filter identifier is to be used to indicate the field being filtered to default domain name, judge in current URL whether
Including presetting domain name, if in current URL including default domain name, judge that current URL is URL to be detected.
Optionally, storage medium is also configured to store for executing following steps to realize according to the mistake in configuration file
It filters field and matched program code is carried out to current URL:
S1 executes the matching operation indicated in filtered fields to current URL;
S2 judges whether according to whether the result that execution matching operation obtains meets the matching result indicated in filtered fields
Success matches current URL.
Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (ROM,
Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or
The various media that can store program code such as CD.
Optionally, the specific example in the present embodiment can be shown with reference to described in above-described embodiment 1 and embodiment 2
Example, details are not described herein for the present embodiment.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
If the integrated unit in above-described embodiment is realized in the form of SFU software functional unit and as independent product
When selling or using, it can store in above-mentioned computer-readable storage medium.Based on this understanding, skill of the invention
Substantially all or part of the part that contributes to existing technology or the technical solution can be with soft in other words for art scheme
The form of part product embodies, which is stored in a storage medium, including some instructions are used so that one
Platform or multiple stage computers equipment (can be personal computer, server or network equipment etc.) execute each embodiment institute of the present invention
State all or part of the steps of method.
In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment
The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed client, it can be by others side
Formula is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, and only one
Kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or
It is desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed it is mutual it
Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module
It connects, can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (16)
1. a kind of filter method of the uniform resource position mark URL of webpage characterized by comprising
It obtains set of URL to be processed to close, wherein the set of URL to be processed closes the URL including multiple webpages to be processed;
To the set of URL to be processed close in each URL execute following filter operation, wherein in the set of URL conjunction to be processed when
The URL of filter operation below preceding execution is current URL:
Judge whether the current URL is URL to be detected according to the filter identifier in preset configuration file;
If the URL is the URL to be detected, the current URL is carried out according to the filtered fields in the configuration file
Matching;
If successfully being matched to the current URL according to the filtered fields, filtered out from the set of URL conjunction to be processed
The current URL.
2. the method according to claim 1, wherein the filter identifier according in preset configuration file
Judge whether the current URL is that URL to be detected includes:
If the filter identifier is to be used to indicate the field being filtered to all webpages, judge that the current URL is
The URL to be detected;Or
If the filter identifier is to be used to indicate the field being filtered to default domain name, judge be in the current URL
No includes the default domain name, if including the default domain name in the current URL, judges that the current URL is described
URL to be detected.
3. the method according to claim 1, wherein the filtered fields according in the configuration file are to institute
It states current URL and match and include:
The matching operation indicated in the filtered fields is executed to the current URL;
Whether meeting the matching result judgement indicated in the filtered fields according to the result that the execution matching operation obtains is
It is no that successfully the current URL is matched.
4. according to the method described in claim 3, it is characterized in that,
Executing the matching operation indicated in the filtered fields to the current URL includes: the feature judged in the current URL
Whether matched in the configuration file is met between characteristic parameter in parameter and the filtered fields, wherein the spy
Sign parameter includes: the status code of webpage corresponding to the current URL, and/or for indicating institute corresponding to the current URL
State the Content length field of the size of webpage;
Whether meeting the matching result judgement indicated in the filtered fields according to the result that the execution matching operation obtains is
If no successfully carry out the characteristic parameter matched include: in the current URL and the spy in the filtered fields to the current URL
When meeting the matched in the configuration file between sign parameter, then judge successfully to match the current URL.
5. according to the method described in claim 3, it is characterized in that,
Executing the matching operation indicated in the filtered fields to the current URL includes: the feature judged in the current URL
Whether matching condition in the configuration file is met between feature string in character string and the filtered fields, wherein
The feature string includes at least one of: the partial character in the link of the current URL, the current URL link
String;
Whether meeting the matching result judgement indicated in the filtered fields according to the result that the execution matching operation obtains is
If no successfully carry out in the feature string matched include: in the current URL and the filtered fields the current URL
When meeting the matching condition in the configuration file between character string, then judge successfully to match the current URL.
6. the method according to claim 1, wherein the configuration file be by include the filter identifier and
The file that the json character string of the filtered fields is formed.
7. method according to any one of claim 1 to 6, which is characterized in that the preset configuration file is multiple
Configuration file, wherein executed by following steps and worked as described in the filter identifier judgement according in preset configuration file
Whether preceding URL be URL to be detected, matched according to the filtered fields in the configuration file to the current URL, from described
Set of URL to be processed filters out the current URL in closing:
Matching configuration file is searched from the multiple configuration file, wherein according to the mistake in the matching configuration file
Filter identifier judges that the current URL is the URL to be detected and according to the filtering word in the matching configuration file
Duan Chenggong matches the current URL;
As long as finding out the matching configuration file from the multiple configuration file, closed from the set of URL to be processed
In filter out the current URL.
8. method according to any one of claim 1 to 6, which is characterized in that in being closed to the set of URL to be processed
Each URL is executed after the filter operation, further includes:
To the net to be processed indicated by each URL in the set of URL conjunction to be processed as having filtered out the current URL
Page executes safe web page scan operation.
9. a kind of filter device of the uniform resource position mark URL of webpage characterized by comprising
Acquiring unit is closed for obtaining set of URL to be processed, wherein it includes multiple webpages to be processed that the set of URL to be processed, which closes,
URL;
Filter element executes following filter operation for each URL in closing to the set of URL to be processed, wherein described wait locate
Managing and currently executing the URL of following filter operation in set of URL conjunction is current URL:
Judge whether the current URL is URL to be detected according to the filter identifier in preset configuration file;
When the URL is the URL to be detected, the current URL is carried out according to the filtered fields in the configuration file
Matching;
When successfully being matched to the current URL according to the filtered fields, filtered out from the set of URL conjunction to be processed
The current URL.
10. device according to claim 9, which is characterized in that the filter element includes:
First judgment module, for sentencing when the filter identifier is to be used to indicate the field being filtered to all webpages
The disconnected current URL out is the URL to be detected;Or
Second judgment module, for when the filter identifier is to be used to indicate the field being filtered to default domain name, then
Judge in the current URL whether to include the default domain name, if including the default domain name in the current URL, judge
The current URL is the URL to be detected.
11. device according to claim 9, which is characterized in that the filter element includes:
Matching module, for executing the matching operation indicated in the filtered fields to the current URL;
Third judgment module, whether the result for being obtained according to the execution matching operation, which meets in the filtered fields, indicates
Matching result judge whether successfully to match the current URL.
12. device according to claim 11, which is characterized in that
The matching module executes indicated in the filtered fields to the current URL to realize by executing following steps
It include: whether to meet between characteristic parameter in the characteristic parameter and the filtered fields judged in the current URL with operation
Matched in the configuration file, wherein the characteristic parameter includes: the status code of webpage corresponding to the current URL,
And/or the Content length field of the size for indicating the webpage corresponding to the current URL;
The third judgment module is realized by executing following steps according to whether executing result that the matching operation obtains
If meeting the matching result indicated in the filtered fields judges whether that it includes: described for successfully carrying out matching to the current URL
Meet the matched in the configuration file between the characteristic parameter in characteristic parameter and the filtered fields in current URL
When, then judge successfully to match the current URL.
13. device according to claim 11, which is characterized in that
The matching module executes indicated in the filtered fields to the current URL to realize by executing following steps
With operation include: between feature string in the feature string and the filtered fields judged in the current URL whether
Meet the matching condition in the configuration file, wherein the feature string includes at least one of: the current URL
Link, the partial character string in the current URL link;
The third judgment module is realized by executing following steps according to whether executing result that the matching operation obtains
If meeting the matching result indicated in the filtered fields judges whether that it includes: described for successfully carrying out matching to the current URL
Meet the matching condition in the configuration file between the character string in feature string and the filtered fields in current URL
When, then judge successfully to match the current URL.
14. device according to claim 9, which is characterized in that the configuration file is by including the filter identifier
The file formed with the json character string of the filtered fields.
15. the device according to any one of claim 9 to 14, which is characterized in that the preset configuration file is more
A configuration file, wherein the filter element includes:
Searching module, for searching matching configuration file from the multiple configuration file, wherein text is configured according to the matching
The filter identifier in part judges that the current URL is the URL to be detected and according in the matching configuration file
The filtered fields successfully the current URL is matched;
Filtering module, as long as finding out the matching configuration file from the multiple configuration file, from described
Set of URL to be processed filters out the current URL in closing.
16. the device according to any one of claim 9 to 14, which is characterized in that further include:
Scanning element, after executing the filter operation for each URL in closing to the set of URL to be processed, to by
It filters out the webpage to be processed indicated by each URL in the set of URL conjunction to be processed of the current URL and executes webpage
Security sweep operation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410284750.6A CN105302815B (en) | 2014-06-23 | 2014-06-23 | The filter method and device of the uniform resource position mark URL of webpage |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410284750.6A CN105302815B (en) | 2014-06-23 | 2014-06-23 | The filter method and device of the uniform resource position mark URL of webpage |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105302815A CN105302815A (en) | 2016-02-03 |
CN105302815B true CN105302815B (en) | 2019-06-07 |
Family
ID=55200091
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410284750.6A Active CN105302815B (en) | 2014-06-23 | 2014-06-23 | The filter method and device of the uniform resource position mark URL of webpage |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105302815B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106227741B (en) * | 2016-07-12 | 2019-08-30 | 国家计算机网络与信息安全管理中心 | A kind of extensive URL matching process based on multilevel hash index chained list |
CN106168977B (en) * | 2016-07-15 | 2019-07-02 | 山谷网安科技股份有限公司 | A kind of column recognition methods for web portal security monitoring |
CN107066510B (en) * | 2017-01-22 | 2021-12-03 | 南方科技大学 | Information processing method and device |
CN108595586B (en) * | 2018-04-19 | 2021-12-24 | 杭州迪普科技股份有限公司 | Method and device for determining search keywords |
CN109639686B (en) * | 2018-12-17 | 2022-02-25 | 江苏满运软件科技有限公司 | Distributed webpage filtering method and device, electronic equipment and storage medium |
CN111259282B (en) * | 2020-02-13 | 2023-08-29 | 深圳市腾讯计算机系统有限公司 | URL (Uniform resource locator) duplication removing method, device, electronic equipment and computer readable storage medium |
CN113411332B (en) * | 2021-06-18 | 2022-10-04 | 杭州安恒信息技术股份有限公司 | CORS vulnerability detection method, device, equipment and medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1798147A (en) * | 2004-12-28 | 2006-07-05 | 华为技术有限公司 | Method for matching uniform resource locator |
CN102110132A (en) * | 2010-12-08 | 2011-06-29 | 北京星网锐捷网络技术有限公司 | Uniform resource locator matching and searching method, device and network equipment |
CN102780681A (en) * | 2011-05-11 | 2012-11-14 | 中兴通讯股份有限公司 | URL (Uniform Resource Locator) filtering system and URL filtering method |
CN103793462A (en) * | 2013-12-02 | 2014-05-14 | 北京奇虎科技有限公司 | URL (uniform resource locator) purifying method and device |
-
2014
- 2014-06-23 CN CN201410284750.6A patent/CN105302815B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1798147A (en) * | 2004-12-28 | 2006-07-05 | 华为技术有限公司 | Method for matching uniform resource locator |
CN102110132A (en) * | 2010-12-08 | 2011-06-29 | 北京星网锐捷网络技术有限公司 | Uniform resource locator matching and searching method, device and network equipment |
CN102780681A (en) * | 2011-05-11 | 2012-11-14 | 中兴通讯股份有限公司 | URL (Uniform Resource Locator) filtering system and URL filtering method |
CN103793462A (en) * | 2013-12-02 | 2014-05-14 | 北京奇虎科技有限公司 | URL (uniform resource locator) purifying method and device |
Non-Patent Citations (1)
Title |
---|
基于综合接入设备的防火墙研究与实现;刘涛;《中国优秀硕士学位论文全文数据库》;20090615(第2009年06期);全文 |
Also Published As
Publication number | Publication date |
---|---|
CN105302815A (en) | 2016-02-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105302815B (en) | The filter method and device of the uniform resource position mark URL of webpage | |
US9954886B2 (en) | Method and apparatus for detecting website security | |
CN102663000B (en) | The maliciously recognition methods of the method for building up of network address database, maliciously network address and device | |
WO2015085948A1 (en) | Method, device, and server for friend recommendation | |
US20160188723A1 (en) | Cloud website recommendation method and system based on terminal access statistics, and related device | |
US10073886B2 (en) | Search results based on a search history | |
CN103077254B (en) | Webpage acquisition methods and device | |
RU2015156608A (en) | NETWORK DEVICE AND SERVICE PROCESS MANAGEMENT METHOD | |
CN107204956B (en) | Website identification method and device | |
CN106453216A (en) | Malicious website interception method, malicious website interception device and client | |
CN110177114A (en) | The recognition methods of network security threats index, unit and computer readable storage medium | |
CN109325161A (en) | Public sentiment data grasping means, device, equipment and storage medium | |
CN104036003B (en) | search result integration method and device | |
EP2857987A1 (en) | Acquiring method, device and system of user behavior | |
CN102867053A (en) | Method, device and system for collecting effective information web pages in website information | |
CN109981745A (en) | A kind of journal file processing method and server | |
CN103186666A (en) | Method, device and equipment for searching based on favorites | |
CN107832221A (en) | Platform semi-automation function test method, apparatus and system based on Burpsuit plug-in units | |
CN108768982A (en) | Detection method, device, computing device and the computer storage media of fishing website | |
CN104967698B (en) | A kind of method and apparatus crawling network data | |
JP2010049473A (en) | Link information extraction device, link information extraction method, and program | |
CN104954415B (en) | Handle the method and device of HTTP request | |
CN105516114B (en) | Method and device for scanning vulnerability based on webpage hash value and electronic equipment | |
CN111209325A (en) | Service system interface identification method, device and storage medium | |
US11556819B2 (en) | Collection apparatus, collection method, and collection program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |