CN105302815B - The filter method and device of the uniform resource position mark URL of webpage - Google Patents

The filter method and device of the uniform resource position mark URL of webpage Download PDF

Info

Publication number
CN105302815B
CN105302815B CN201410284750.6A CN201410284750A CN105302815B CN 105302815 B CN105302815 B CN 105302815B CN 201410284750 A CN201410284750 A CN 201410284750A CN 105302815 B CN105302815 B CN 105302815B
Authority
CN
China
Prior art keywords
url
current
configuration file
matching
current url
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410284750.6A
Other languages
Chinese (zh)
Other versions
CN105302815A (en
Inventor
何双宁
董昭
马杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201410284750.6A priority Critical patent/CN105302815B/en
Publication of CN105302815A publication Critical patent/CN105302815A/en
Application granted granted Critical
Publication of CN105302815B publication Critical patent/CN105302815B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a kind of filter method of the uniform resource position mark URL of webpage and devices, wherein closes this method comprises: obtaining set of URL to be processed, wherein set of URL to be processed closes the URL including multiple webpages to be processed;Following filter operation is executed to each URL in set of URL to be processed conjunction, wherein currently executing the URL of following filter operation in set of URL conjunction to be processed is current URL: judging whether current URL is URL to be detected according to the filter identifier in preset configuration file;If URL is URL to be detected, current URL is matched according to the filtered fields in configuration file;If successfully matching to current URL according to filtered fields, current URL is filtered out from set of URL to be processed conjunction.The present invention solve due to the prior art can not filtering spam webpage URL the technical issues of, to realize after the URL for filtering out spam page progress Web security sweep, improve the efficiency of Web security sweep.

Description

The filter method and device of the uniform resource position mark URL of webpage
Technical field
The present invention relates to computer fields, in particular to a kind of filtering side of the uniform resource position mark URL of webpage Method and device.
Background technique
When carrying out webpage Web security sweep to CGI(Common gateway interface) (CGI, Common Gateway Interface), It is generally necessary to collect all CGI as far as possible, and the rubbish page therein is filtered out, improves the efficiency of Web security sweep.Mesh Before, it mainly includes following two that those skilled in the art, which usually acquire the method for CGI: first is that by web crawlers, in internet On crawl URL;Second is that obtaining CGI by the flow for bypassing WAF.However, the method for above-mentioned both acquisitions CGI, all can not What is avoided is collected into many spam pages, wherein and above-mentioned spam page can be webpage cannot accessing or being not present, this A little spam pages are meaningless to Web security sweep, or even largely affect the efficiency of Web security sweep.
As the quantity of collected CGI is continuously increased, the spam page being collected by above-mentioned CGI acquisition method also with Increase, in this way, rapidly filtering out spam page, and mistake from the URL of magnanimity during webpage Web security sweep The corresponding URL of spam page is filtered, is just become particularly significant.
However, being directed to above-mentioned problem, currently no effective solution has been proposed.
Summary of the invention
The embodiment of the invention provides a kind of filter method of the uniform resource position mark URL of webpage and devices, at least Solve due to the prior art can not filtering spam webpage URL the technical issues of.
According to an aspect of an embodiment of the present invention, a kind of filtering side of the uniform resource position mark URL of webpage is provided Method, comprising: obtain set of URL to be processed and close, wherein above-mentioned set of URL to be processed closes the URL including multiple webpages to be processed;To upper State set of URL to be processed close in each URL execute following filter operation, wherein currently executed in above-mentioned set of URL conjunction to be processed with The URL of lower filter operation is current URL: whether judging above-mentioned current URL according to the filter identifier in preset configuration file For URL to be detected;If above-mentioned URL is above-mentioned URL to be detected, according to the filtered fields in above-mentioned configuration file to above-mentioned current URL is matched;If successfully being matched to above-mentioned current URL according to above-mentioned filtered fields, closed from above-mentioned set of URL to be processed In filter out above-mentioned current URL.
According to another aspect of an embodiment of the present invention, a kind of filtering of the uniform resource position mark URL of webpage is additionally provided Device, comprising: acquiring unit is closed for obtaining set of URL to be processed, wherein it includes multiple to be processed that above-mentioned set of URL to be processed, which closes, The URL of webpage;Filter element executes following filter operation for each URL in closing to above-mentioned set of URL to be processed, wherein on Stating and currently executing the URL of following filter operation in set of URL conjunction to be processed is current URL: according to the mistake in preset configuration file Filter identifier judges whether above-mentioned current URL is URL to be detected;When above-mentioned URL is above-mentioned URL to be detected, matched according to above-mentioned The filtered fields set in file match above-mentioned current URL;According to above-mentioned filtered fields successfully to above-mentioned current URL into When row matching, above-mentioned current URL is filtered out from above-mentioned set of URL conjunction to be processed.
In embodiments of the present invention, by being filtered using to be processed URL of the configuration file to acquisition, wherein above-mentioned Filter identifier, filtered fields are included at least in configuration file, by whether judging above-mentioned URL to be processed using filter identifier For URL to be detected, to achieve the purpose that carry out preliminary screening to above-mentioned URL, then by filtered fields to URL to be detected into Row matching, and then is filtered the URL of successful match, to realize during Web security sweep, no longer to need not URL corresponding to the spam page wanted is scanned, to realize the efficiency for improving Web security sweep.And then solve by In the prior art can not filtering spam webpage URL the technical issues of.
In addition, by using the characteristic parameter and/or feature string in filtered fields, to above-mentioned URL to be detected according to Scheduled matching way is matched, and has achieved the purpose that accurately filtering to URL, to realize the unification improved to webpage The technical effect of the accuracy of the filtering of Resource Locator URL.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is according to an embodiment of the present invention a kind of optionally using the filtering side of the uniform resource position mark URL of webpage The hardware environment schematic diagram of method;
Fig. 2 is a kind of filter method of the uniform resource position mark URL of optional webpage according to an embodiment of the present invention Flow chart;
Fig. 3 is a kind of method of optional uniform resource position mark URL for obtaining webpage according to an embodiment of the present invention Flow chart;
Fig. 4 is in a kind of filter method of the uniform resource position mark URL of optional webpage according to an embodiment of the present invention Configuration file schematic diagram;
Fig. 5 is the filter method of the uniform resource position mark URL of another optional webpage according to an embodiment of the present invention Flow chart;
Fig. 6 is the filter method of the uniform resource position mark URL of another optional webpage according to an embodiment of the present invention In configuration file schematic diagram;
Fig. 7 is a kind of filter device of the uniform resource position mark URL of optional webpage according to an embodiment of the present invention Schematic diagram;And
Fig. 8 is according to an embodiment of the present invention a kind of optionally using the filtering side of the uniform resource position mark URL of webpage The schematic diagram of the server of method.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product Or other step or units that equipment is intrinsic.
Embodiment 1
According to embodiments of the present invention, a kind of filter method of the uniform resource position mark URL of webpage, above-mentioned webpage are provided The filter method of uniform resource position mark URL can be applied in hardware environment as shown in Figure 1, wherein for webpage Uniform resource position mark URL execute the filtering server 102 of filtering and can pass through network and the webpage where above-mentioned webpage takes Business device 104 establishes the link, and is filtered to the URL to be processed sent by above-mentioned web page server 104.Wherein, above-mentioned net Network includes but is not limited to: wide area network, Metropolitan Area Network (MAN) or local area network.
Optionally, as shown in Fig. 2, the filter method of the URL of the webpage in the present embodiment includes:
S202 obtains set of URL to be processed and closes, wherein set of URL to be processed closes the URL including multiple webpages to be processed;
S204 executes following filter operation to each URL in set of URL to be processed conjunction, wherein during set of URL to be processed closes The current URL for executing following filter operation is current URL:
S2042 judges whether current URL is URL to be detected according to the filter identifier in preset configuration file;
S2044 matches current URL according to the filtered fields in configuration file if URL is URL to be detected;
S2046 is filtered out from set of URL to be processed conjunction and is worked as if successfully being matched to current URL according to filtered fields Preceding URL;
S2048, if URL is not URL to be detected, alternatively, if not carried out successfully to current URL according to filtered fields Match, does not then filter out current URL from set of URL to be processed conjunction.
Optionally, in the present embodiment, the filter method of the uniform resource position mark URL of above-mentioned webpage can be applied to During Web security sweep.For example, as shown in connection with fig. 1, before executing to above-mentioned Web security sweep, obtaining above-mentioned wait locate The set of URL of reason closes, wherein above-mentioned set of URL to be processed closes the URL including multiple webpages to be processed, to every in the conjunction of above-mentioned set of URL A URL executes filter operation, so as to filter out unnecessary execution Web peace from the URL of magnanimity acquired in filtering server 102 URL corresponding to the spam page of full scan.The example above is a kind of example, and the present embodiment does not do any restriction to this.
Optionally, in the present embodiment, as shown in connection with fig. 3, before obtaining set of URL to be processed and closing, filtering server Interactive process between 102 and web page server 104:
S302, filtering server 102 can be sent to web page server 104 by network and obtain what set of URL to be processed closed Request;
S304, set of URL to be processed can be returned to filtering server 102 by responding the above-mentioned web page server 104 of above-mentioned request It closes.
Optionally, in the present embodiment, above-mentioned configuration file is by the json word including filter identifier and filtered fields The file that symbol string is formed, wherein json is a kind of data interchange language JavaScript Object Notation of lightweight, Above-mentioned language based on text is easy to that people is allowed to read, while also facilitating machine and being parsed and generated.Wherein, above-mentioned Filter identifier can include but is not limited to: the scope of application for executing filtering is closed to above-mentioned set of URL to be processed.For example, above-mentioned suitable It can include but is not limited to range: global webpage, local webpage, wherein above-mentioned part webpage can be by presetting domain name Mode is screened.Above-mentioned filtered fields can include but is not limited to: indicate the matching that filtering is executed to above-mentioned URL to be detected As a result, wherein can include but is not limited to multiple filtering subfields in above-mentioned filtered fields.For example, above-mentioned matching result can be with Including but not limited to: for matched characteristic parameter and its matching way, being used for matched feature string and its matching way.
For example, above-mentioned filter identifier is identified with " host ", when the value of above-mentioned " host " is as shown in 402 in Fig. 4 " * ", then it represents that above-mentioned filtering is suitable for the filtering to all webpages;When the value of above-mentioned " host " is " domain name/IP ", then it represents that Above-mentioned filtering is suitable for the webpage corresponding to above-mentioned " domain name/IP ".As the current URL for judging above-mentioned current execution filter operation Meet above-mentioned filter identifier, then judges that above-mentioned current URL is URL to be detected.
In another example as shown in 404 in Fig. 4, above-mentioned filtered fields are identified with " rule ", wherein can be in above-mentioned " rule " Subfield including but not limited to as follows: the characteristic parameter of status code " HttpCode " 1) is set;2) message text is set The feature string of " Content ".For example, the value of configuration file configuration status code " HttpCode " " is equal to " numerical value " 200 ", Configure message text " Content " character string be " http://qzone.qq.com/gy/404/data.js ", when it is above-mentioned to Detect the equal successful match of all subfields in URL and above-mentioned filtered fields, then may determine that above-mentioned URL matching to be detected at Function filters out above-mentioned current URL from above-mentioned set of URL conjunction to be processed.
Optionally, in the present embodiment, above-mentioned configuration file can also include but is not limited to: the type name of configuration file Claim, the attribute of configuration file, wherein the attribute of above-mentioned configuration file can include but is not limited to: the addition time of configuration file, The adder of configuration file.For example, the typonym of configuration file is " gongyi404 ", in Fig. 4 as shown in 406 in Fig. 4 Shown in 408, the addition time of configuration file is " 2013-10-13 ", and the adder of configuration file is " zhangsan ".
Optionally, in the present embodiment, after having executed filtering to above-mentioned URL to be processed, it will filter out spam page Corresponding URL is saved, and is scanned in order to call when Web security sweep, and the efficiency for improving Web security sweep is reached.
Optionally, in the present embodiment, above-mentioned configuration file can be saved in the form of Hash table, and the position of preservation can be with For at least one of: in disk text file, in the file of database server.Optionally, when needing to above-mentioned to be processed URL when executing filtering, just by above-mentioned position load above-mentioned configuration file realize to the URL in above-mentioned set of URL conjunction to be processed into Row filtering.Optionally, in the present embodiment, the mode for loading above-mentioned configuration file can be, but not limited to as in above-mentioned Hash table Traversal searches configuration file corresponding with current URL.
Optionally, above-mentioned preset configuration file is multiple configuration files, wherein is executed by following steps according to default Configuration file in filter identifier judge whether current URL is URL to be detected, according to the filtered fields pair in configuration file Current URL is matched, is filtered out current URL from set of URL to be processed conjunction: matching configuration text is searched from multiple configuration files Part, wherein judge that current URL is URL to be detected and configures according to matching according to the filter identifier in matching configuration file Filtered fields in file successfully match current URL;As long as finding out a matching configuration from multiple configuration files File then filters out current URL from set of URL to be processed conjunction.
Specifically flow chart referring to figure 1 and figure 5, illustrates the filtering stream of the uniform resource position mark URL of above-mentioned webpage Journey, it is assumed that set of URL close in include URL_1, URL_2, URL_3, URL_4, URL_5, multiple configuration files be respectively P_1, P_2, P_3。
As shown in Figure 1 and Figure 5, the filter method of the URL of the webpage in the present embodiment includes:
S502, filtering server 102 are closed to 104 request of web page server set of URL to be processed;
S504, web page server 104 return to set of URL to be processed to filtering server 102 and close;
S506, filtering server 102 judge whether to execute filter operation to all URL, wherein during above-mentioned set of URL closes Including URL_1, URL_2, URL_3, if judging, there are also URL to be not carried out filter operation in above-mentioned set of URL conjunction, is thened follow the steps S508;If judging, the URL in above-mentioned set of URL conjunction had executed filter operation, terminated this filtering;
S508, filtering server 102 read a URL (for example, reading URL_3), wherein above-mentioned URL is (for example, URL_ 3) also it is not carried out filter operation;
S510, the lookup of filtering server 102 judges whether to perform matching operation to all configuration files, if judging There are also configuration files to be not carried out matching operation, thens follow the steps S512, if judging to perform matching to all configuration files It operates (S514, alternatively, S514 and S516), then returns to S506;Wherein, it is right to judge whether in 102 first time of filtering server When all configuration files have matched, filtering server 102 loads all configuration files (for example, configuration file P_1, P_ first 2,P_3);
S512, filtering server 102 read a configuration file (for example, configuration file P_2), wherein above-mentioned to be read Configuration file (for example, configuration file P_2) be also not carried out matching operation;
S514, filtering server 102 judge whether current URL meets matching result indicated by filter identifier, if full Foot, thens follow the steps S516, otherwise, returns to lookup again and judges new configuration file;
S516, filtering server 102 successively match current URL (for example, URL_3) with the subfield in filtered fields;
S518, filtering server 102 judges whether successful match, if successful match, judges above-mentioned current URL (example Such as, URL_3) it is the corresponding URL of spam page, step S520 is executed, if failed matching, returns to S510, judge whether Matching operation (for example, configuration file P_3 has not carried out matching operation) is performed to all configuration files;
S520, filtering server 102 will filter out the corresponding URL of above-mentioned spam page, and return to step S506, with Judge whether to execute filter operations to all URL in set of URL conjunction.
By the step S518 in the above process it is found that current URL (for example, URL_3) be currently executing matched match It sets file P_2 to match not successfully, then returns to lookup again and judge new configuration file, such as read again from multiple configuration files One is also not carried out the configuration file of matching operation, for example, the configuration file is configuration file P_3.If current URL (for example, URL_3) be currently executing matched for configuration file P_2 successful match, i.e. configuration file P_2 is that the matching found out is matched File is set, S520 is thened follow the steps, is judged out current URL corresponding to the webpage for spam page (for example, URL_ for above-mentioned 3) it filters out.Then, S506 is returned to step, judges whether to execute filter operations to all URL in set of URL conjunction.
The embodiment provided through the invention, by the filter identifier in preset configuration file from the to be processed of magnanimity Set of URL filters out URL to be detected in closing, further, using the filtered fields in configuration file to carrying out in above-mentioned URL to be detected Matching to obtain needing the URL filtered out in closing from above-mentioned set of URL to be processed, and then is realized in the set of URL conjunction got URL corresponding to unnecessary spam page is effectively filtered, thus during Web security sweep, without to rubbish URL corresponding to webpage is scanned, and achievees the effect that the efficiency for improving Web security sweep.
As a kind of optional scheme, step S2042 judges current according to the filter identifier in preset configuration file Whether URL is that URL to be detected includes:
If 1) filter identifier is to be used to indicate the field being filtered to all webpages, judge current URL be to Detect URL;Or
If 2) filter identifier is to be used to indicate the field being filtered to default domain name, judge in current URL whether Including presetting domain name, if in current URL including default domain name, judge that current URL is URL to be detected.
It is specifically illustrated in conjunction with following example, as shown in 402 in Fig. 4, identifies above-mentioned filter identifier with " host ", when The value of above-mentioned " host " is " * ", then it represents that above-mentioned filtering is suitable for the filtering to all webpages, then judges above-mentioned all nets Page corresponding to URL be URL to be detected, above-mentioned URL to be detected by be used for execute after matching judgment.
It is specifically illustrated in conjunction with following example, configuration file as shown in Figure 6, when the value of above-mentioned " host " 602 For " www.sina.com/168.1.1.3 ", then it represents that above-mentioned filtering corresponding webpage suitable for above-mentioned Sina website.Work as judgement The above-mentioned current current URL for executing filter operation includes above-mentioned preset domain name out, then judge above-mentioned current URL be it is described to Detect URL.
Optionally, in the present embodiment, if above-mentioned current URL is unsatisfactory for the range of filter identifier instruction, illustrate Stating current URL not is the corresponding URL of spam page, then does not continue to the judgement for executing filter operation.
The embodiment provided through the invention screens above-mentioned set of URL conjunction to be processed by filter identifier, with The URL to be detected in corresponding range is obtained, and then within the above range falls the corresponding url filtering of spam page, is realized pair URL in preset range executes filter operation, is improved with reaching to url filtering accuracy.
As a kind of optional scheme, step S2044, according to the filtered fields in configuration file to current URL progress With including:
S1 executes the matching operation indicated in filtered fields to current URL;
S2 judges whether according to whether the result that execution matching operation obtains meets the matching result indicated in filtered fields Success matches current URL.
Optionally, in the present embodiment, the matching operation indicated in above-mentioned filtered fields includes but is not limited to: characteristic parameter Matching, characteristic character String matching.Wherein, features described above parameter can include but is not limited at least one of: current URL institute is right The status code for the webpage answered, and/or the Content length field of the size for indicating webpage corresponding to current URL.Above-mentioned spy Sign character string can include but is not limited at least one of: the partial character string in the currently link of URL, current URL link.
Optionally, in the present embodiment, can be set in above-mentioned filtered fields characteristic parameter and/or feature string and Its corresponding matching way executes matching operation to the URL to be detected to realize.Optionally, in the present embodiment, above-mentioned spy The matching way of sign parameter can include but is not limited to are as follows: greater than the value of features described above parameter, less than features described above parameter Value, equal to the value of features described above parameter.In the present embodiment, the matching way of features described above character string may include but not It is limited to are as follows: search matching, canonical matching.
For example, the matching result progress that will be indicated in the current URL for being used to execute filter operation and above-mentioned filtered fields Match, if current URL, which executes the result that matching operation obtains, meets the matching result indicated in filtered fields, it is right to judge successfully Current URL is matched.
The embodiment provided through the invention, by executing matching operation to URL to be detected using filtered fields, further Judge whether above-mentioned URL to be detected is the corresponding URL of unnecessary spam page, realize URL corresponding to spam page into Row accurately judgement reduces the cost of Web security sweep to realize the efficiency for improving Web security sweep.
As a kind of optional scheme, S1, executing the matching operation indicated in filtered fields to current URL includes: S10, Judge whether meet the matched in configuration file between the characteristic parameter in characteristic parameter and filtered fields in current URL, Wherein, characteristic parameter includes: the status code of webpage corresponding to current URL, and/or for indicating net corresponding to current URL The Content length field of the size of page;Whether S2 indicates in filtered fields according to executing matching operation obtained result and meeting Matching result judges whether that successfully carrying out matching to current URL includes: S20, if characteristic parameter and filtered fields in current URL In characteristic parameter between when meeting the matched in configuration file, then judge successfully to match current URL.
It is specifically illustrated in conjunction with following example, as shown in connection with fig. 6, it is assumed that current URL is executed in filtered fields and is indicated Matching operation be to be matched to characteristic parameter, then judge characteristic parameter in current URL whether with the spy in filtered fields Sign parameter meets scheduled matched.For example, the characteristic parameter in above-mentioned filtered fields is status code " HttpCode ", it is scheduled Matched be "=, 200 ", then judge whether characteristic parameter (that is, status code " HttpCode ") in current URL is equal to 200, if Judge that the status code " HttpCode " in current URL meets above-mentioned matched, then judges successfully to carry out above-mentioned current URL Matching.
As a kind of optional scheme, S1, executing the matching operation indicated in filtered fields to current URL includes: S10, Judge whether meet the matching in configuration file between the feature string in feature string and filtered fields in current URL Condition, wherein feature string includes at least one of: the partial character string in the currently link of URL, current URL link; S2 judges whether successfully according to executing the obtained result of matching operation and whether meeting the matching result indicated in filtered fields to working as It includes: S20 that preceding URL, which carries out matching, if meeting configuration between the character string in the feature string and filtered fields in current URL When matching condition in file, then judge successfully to match current URL.
It is specifically illustrated in conjunction with following example, as shown in connection with fig. 6, it is assumed that current URL is executed in filtered fields and is indicated Matching operation be to be matched to feature string, then judge feature string in current URL whether in filtered fields Feature string meet scheduled matching condition.For example, the feature string in above-mentioned filtered fields is message text " Content ", scheduled matching condition be " substr=, stc=" http://news.sina.com/gj/303/ Data.js " ", then judge whether the feature string (that is, message text " Content ") in current URL meets above-mentioned matching item The concatenation character string of world news in part, such as instruction Sina News shown in fig. 6, if being found in current URL above-mentioned Feature string is then judged successfully to match above-mentioned current URL.
In another example setting canonical matched, using the partial character string in complete character string as feature string, using just Whether then matched mode judges in above-mentioned current URL to include feature string set in canonical matched, with realization pair URL comprising certain specific partial character string is filtered, so that the filtering in the present embodiment to current URL, it can needle URL comprising certain specific a kind of character string is filtered.
The embodiment provided through the invention, by by current URL characteristic parameter and/or feature string and filtering Set characteristic parameter and/or feature string are matched according to scheduled matching way in field, are realized and are accurately sentenced The corresponding URL of spam page in disconnected above-mentioned URL to be detected out, so that the corresponding URL of above-mentioned spam page is carried out accurate mistake Filter, and then Web security sweep is carried out to filtered URL, achieve the effect that the efficiency for improving Web security sweep.
As a kind of optional scheme, in step S206, filter operation is executed to each URL in set of URL to be processed conjunction Later, further includes:
S1 holds webpage to be processed indicated by each URL in the set of URL to be processed conjunction as having filtered out current URL Row safe web page scan operation.
It is specifically illustrated in conjunction with following example, after having filtered out the corresponding URL of spam page, by above-mentioned set of URL Remaining URL is saved after filtering out the corresponding URL of spam page in conjunction, when executing safe web page scan operation, directly Call the above-mentioned URL without spam page saved.
The embodiment provided through the invention, by being closed to the set of URL to be processed for having filtered out the corresponding URL of spam page In each URL indicated by webpage to be processed execute safe web page scan operation, avoid to realize to spam page pair The URL answered executes safe web page scan operation, has achieved the effect that improve Web security sweep.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention It is necessary.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing The part that technology contributes can be embodied in the form of software products, which is stored in a storage In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
Embodiment 2
According to embodiments of the present invention, a kind of filter device of the uniform resource position mark URL of webpage, above-mentioned webpage are provided The filter device of uniform resource position mark URL can be applied in hardware environment as shown in Figure 1, wherein above-mentioned apparatus position In the filtering server 102 for executing filtering for the uniform resource position mark URL to webpage, filtering server 102 can lead to Network is crossed to establish the link with the web page server 104 where above-mentioned webpage, and to by above-mentioned web page server 104 send to The URL of reason is filtered.Wherein, above-mentioned network includes but is not limited to: wide area network, Metropolitan Area Network (MAN) or local area network.
According to embodiments of the present invention, a kind of filter device of the uniform resource position mark URL of webpage, such as Fig. 7 are additionally provided Shown, which includes:
1) it is closed for obtaining set of URL to be processed acquiring unit 702, wherein it includes multiple to be processed that set of URL to be processed, which closes, The URL of webpage;
2) filter element 704 execute following filter operation for each URL in closing to set of URL to be processed, wherein to Handling and currently executing the URL of following filter operation in set of URL conjunction is current URL:
I) judge whether current URL is URL to be detected according to the filter identifier in preset configuration file;
Ii when) URL is URL to be detected, current URL is matched according to the filtered fields in configuration file;
When iii) successfully being matched to current URL according to filtered fields, filtered out from set of URL to be processed conjunction current URL。
Optionally, in the present embodiment, the filter method of the uniform resource position mark URL of above-mentioned webpage can be applied to During Web security sweep.For example, as shown in connection with fig. 1, before executing to above-mentioned Web security sweep, obtaining above-mentioned wait locate The set of URL of reason closes, wherein above-mentioned set of URL to be processed closes the URL including multiple webpages to be processed, to every in the conjunction of above-mentioned set of URL A URL executes filter operation, so as to filter out unnecessary execution Web peace from the URL of magnanimity acquired in filtering server 102 URL corresponding to the spam page of full scan.The example above is a kind of example, and the present embodiment does not do any restriction to this.
Optionally, in the present embodiment, as shown in connection with fig. 3, before obtaining set of URL to be processed and closing, filtering server Interactive process between 102 and web page server 104:
S302, filtering server 102 can be sent to web page server 104 by network and obtain what set of URL to be processed closed Request;
S304, set of URL to be processed can be returned to filtering server 102 by responding the above-mentioned web page server 104 of above-mentioned request It closes.
Optionally, in the present embodiment, above-mentioned configuration file is by the json word including filter identifier and filtered fields The file that symbol string is formed, wherein json is a kind of data interchange language JavaScript Object Notation of lightweight, Above-mentioned language based on text is easy to that people is allowed to read, while also facilitating machine and being parsed and generated.Wherein, above-mentioned Filter identifier can include but is not limited to: the scope of application for executing filtering is closed to above-mentioned set of URL to be processed.For example, above-mentioned suitable It can include but is not limited to range: global webpage, local webpage, wherein above-mentioned part webpage can be by presetting domain name Mode is screened.Above-mentioned filtered fields can include but is not limited to: indicate the matching that filtering is executed to above-mentioned URL to be detected As a result, wherein can include but is not limited to multiple filtering subfields in above-mentioned filtered fields.For example, above-mentioned matching result can be with Including but not limited to: for matched characteristic parameter and its matching way, being used for matched feature string and its matching way.
For example, above-mentioned filter identifier is identified with " host ", when the value of above-mentioned " host " is as shown in 402 in Fig. 4 " * ", then it represents that above-mentioned filtering is suitable for the filtering to all webpages;When the value of above-mentioned " host " is " domain name/IP ", then it represents that Above-mentioned filtering is suitable for the webpage corresponding to above-mentioned " domain name/IP ".As the current URL for judging above-mentioned current execution filter operation Meet above-mentioned filter identifier, then judges that above-mentioned current URL is URL to be detected.
In another example as shown in 404 in Fig. 4, above-mentioned filtered fields are identified with " rule ", wherein can be in above-mentioned " rule " Subfield including but not limited to as follows: the characteristic parameter of status code " HttpCode " 1) is set;2) message text is set The feature string of " Content ".For example, the value of configuration file configuration status code " HttpCode " " is equal to " numerical value " 200 ", Configure message text " Content " character string be " http://qzone.qq.com/gy/404/data.js ", when it is above-mentioned to Detect the equal successful match of all subfields in URL and above-mentioned filtered fields, then may determine that above-mentioned URL matching to be detected at Function filters out above-mentioned current URL from above-mentioned set of URL conjunction to be processed.
Optionally, in the present embodiment, above-mentioned configuration file can also include but is not limited to: the type name of configuration file Claim, the attribute of configuration file, wherein the attribute of above-mentioned configuration file can include but is not limited to: the addition time of configuration file, The adder of configuration file.For example, the typonym of configuration file is " gongyi404 ", in Fig. 4 as shown in 406 in Fig. 4 Shown in 408, the addition time of configuration file is " 2013-10-13 ", and the adder of configuration file is " zhangsan ".
Optionally, in the present embodiment, after having executed filtering to above-mentioned URL to be processed, it will filter out spam page Corresponding URL is saved, and is scanned in order to call when Web security sweep, and the efficiency for improving Web security sweep is reached.
Optionally, in the present embodiment, above-mentioned configuration file can be saved in the form of Hash table, and the position of preservation can be with For at least one of: in disk text file, in the file of database server.Optionally, when needing to above-mentioned to be processed URL when executing filtering, just by above-mentioned position load above-mentioned configuration file realize to the URL in above-mentioned set of URL conjunction to be processed into Row filtering.Optionally, in the present embodiment, the mode for loading above-mentioned configuration file can be, but not limited to as in above-mentioned Hash table Traversal searches configuration file corresponding with current URL.
Optionally, above-mentioned preset configuration file is multiple configuration files, wherein the filter device of the URL of above-mentioned webpage It is executed by following steps and judges whether current URL is URL to be detected, root according to the filter identifier in preset configuration file Current URL is matched according to the filtered fields in configuration file, filters out current URL from set of URL to be processed conjunction: from multiple Matching configuration file is searched in configuration file, wherein judge that current URL is according to the filter identifier in matching configuration file URL to be detected and according to matching configuration file in filtered fields successfully current URL is matched;As long as from multiple configurations A matching configuration file is found out in file, then filters out current URL from set of URL to be processed conjunction.
Specifically flow chart referring to figure 1 and figure 5, illustrates the filtering stream of the uniform resource position mark URL of above-mentioned webpage Journey, it is assumed that set of URL close in include URL_1, URL_2, URL_3, URL-4, URL_5, multiple configuration files be respectively P_1, P_2, P_3。
S502, filtering server 102 are closed to 104 request of web page server set of URL to be processed;
S504, web page server 104 return to set of URL to be processed to filtering server 102 and close;
S506, filtering server 102 judge whether to execute filter operation to all URL, wherein during above-mentioned set of URL closes Including URL_1, URL_2, URL_3, if judging, there are also URL to be not carried out filter operation in above-mentioned set of URL conjunction, is thened follow the steps S508;If judging, the URL in above-mentioned set of URL conjunction had executed filter operation, terminated this filtering;
S508, filtering server 102 read a URL (for example, reading URL_3), wherein above-mentioned URL is (for example, URL_ 3) also it is not carried out filter operation;
S510, the lookup of filtering server 102 judges whether to perform matching operation to all configuration files, if judging There are also configuration files to be not carried out matching operation, thens follow the steps S512, if judging to perform matching to all configuration files It operates (S514, alternatively, S514 and S516), then returns to S506;Wherein, it is right to judge whether in 102 first time of filtering server When all configuration files have matched, filtering server 102 loads all configuration files (for example, configuration file P_1, P_ first 2,P_3);
S512, filtering server 102 read a configuration file (for example, configuration file P_2), wherein above-mentioned to be read Configuration file (for example, configuration file P_2) be also not carried out matching operation;
S514, filtering server 102 judge whether current URL meets matching result indicated by filter identifier, if full Foot, thens follow the steps S516, otherwise, returns to lookup again and judges new configuration file;
S516, filtering server 102 successively match current URL (for example, URL_3) with the subfield in filtered fields;
S518, filtering server 102 judges whether successful match, if successful match, judges above-mentioned current URL (example Such as, URL_3) it is the corresponding URL of spam page, step S520 is executed, if failed matching, returns to S510, judge whether Matching operation (for example, configuration file P_3 has not carried out matching operation) is performed to all configuration files;
S520, filtering server 102 will filter out the corresponding URL of above-mentioned spam page, and return to step S506, sentence It is disconnected whether filter operations to have been executed to all URL in set of URL conjunction.
The embodiment provided through the invention, by the filter identifier in preset configuration file from the to be processed of magnanimity Set of URL filters out URL to be detected in closing, further, using the filtered fields in configuration file to carrying out in above-mentioned URL to be detected Matching to obtain needing the URL filtered out in closing from above-mentioned set of URL to be processed, and then is realized in the set of URL conjunction got URL corresponding to unnecessary spam page is effectively filtered, thus during Web security sweep, without to rubbish URL corresponding to webpage is scanned, and achievees the effect that the efficiency for improving Web security sweep.
As a kind of optional scheme, above-mentioned filter element 704 includes:
1) first judgment module is sentenced for being to be used to indicate the field being filtered to all webpages in filter identifier Disconnected current URL out is URL to be detected;Or
2) the second judgment module, for being to be used to indicate the field being filtered to default domain name in filter identifier, then Judge in current URL whether to include default domain name, if in current URL including default domain name, judges that current URL is to be detected URL。
It is specifically illustrated in conjunction with following example, as shown in 402 in Fig. 4, identifies above-mentioned filter identifier with " host ", when The value of above-mentioned " host " is " * ", then it represents that above-mentioned filtering is suitable for the filtering to all webpages, then judges above-mentioned all nets Page corresponding to URL be URL to be detected, above-mentioned URL to be detected by be used for execute after matching judgment.
It is specifically illustrated in conjunction with following example, configuration file as shown in Figure 6, when the value of above-mentioned " host " 602 For " www.sina.com/168.1.1.3 ", then it represents that above-mentioned filtering corresponding webpage suitable for above-mentioned Sina website.Work as judgement The above-mentioned current current URL for executing filter operation includes above-mentioned preset domain name out, then judge above-mentioned current URL be it is described to Detect URL.
Optionally, in the present embodiment, if above-mentioned current URL is unsatisfactory for the range of filter identifier instruction, illustrate Stating current URL not is the corresponding URL of spam page, then does not continue to the judgement for executing filter operation.
The embodiment provided through the invention screens above-mentioned set of URL conjunction to be processed by filter identifier, with The URL to be detected in corresponding range is obtained, and then within the above range falls the corresponding url filtering of spam page, is realized pair URL in preset range executes filter operation, is improved with reaching to url filtering accuracy.
As a kind of optional scheme, above-mentioned filter element 704 includes:
1) matching module, for executing the matching operation indicated in filtered fields to current URL;
2) third judgment module, for being indicated in filtered fields according to executing matching operation obtained result and whether meet Matching result judges whether successfully to match current URL.
Optionally, in the present embodiment, the matching operation indicated in above-mentioned filtered fields includes but is not limited to: characteristic parameter Matching, characteristic character String matching.Wherein, features described above parameter can include but is not limited at least one of: current URL institute is right The status code for the webpage answered, and/or the Content length field of the size for indicating webpage corresponding to current URL.Above-mentioned spy Sign character string can include but is not limited at least one of: the partial character string in the currently link of URL, current URL link.
Optionally, in the present embodiment, can be set in above-mentioned filtered fields characteristic parameter and/or feature string and Its corresponding matching way executes matching operation to the URL to be detected to realize.Optionally, in the present embodiment, above-mentioned spy The matching way of sign parameter can include but is not limited to are as follows: greater than the value of features described above parameter, less than features described above parameter Value, equal to the value of features described above parameter.In the present embodiment, the matching way of features described above character string may include but not It is limited to are as follows: search matching, canonical matching.
For example, the matching result progress that will be indicated in the current URL for being used to execute filter operation and above-mentioned filtered fields Match, if current URL, which executes the result that matching operation obtains, meets the matching result indicated in filtered fields, it is right to judge successfully Current URL is matched.
The embodiment provided through the invention, by executing matching operation to URL to be detected using filtered fields, further Judge whether above-mentioned URL to be detected is the corresponding URL of unnecessary spam page, realize URL corresponding to spam page into Row accurately judgement reduces the cost of Web security sweep to realize the efficiency for improving Web security sweep.
As a kind of optional scheme, above-mentioned matching module executed current URL with realizing by executing following steps The matching operation indicated in filter field includes: between the characteristic parameter in the characteristic parameter and filtered fields judged in current URL Whether matched in configuration file is met, wherein characteristic parameter includes: the status code of webpage corresponding to current URL, and/ Or the Content length field of the size for indicating webpage corresponding to current URL;Above-mentioned third judgment module by execute with Lower step is according to executing the obtained result of matching operation and whether meet the matching result indicated in filtered fields judgement to realize If it is no successfully to current URL carry out matching include: between characteristic parameter in characteristic parameter and filtered fields in current URL it is full When matched in sufficient configuration file, then judge successfully to match current URL.
It is specifically illustrated in conjunction with following example, as shown in connection with fig. 6, it is assumed that current URL is executed in filtered fields and is indicated Matching operation be to be matched to characteristic parameter, then judge characteristic parameter in current URL whether with the spy in filtered fields Sign parameter meets scheduled matched.For example, the characteristic parameter in above-mentioned filtered fields is status code " HttpCode ", it is scheduled Matched be "=, 200 ", then judge whether characteristic parameter (that is, status code " HttpCode ") in current URL is equal to 200, if Judge that the status code " HttpCode " in current URL meets above-mentioned matched, then judges successfully to carry out above-mentioned current URL Matching.
As a kind of optional scheme, above-mentioned matching module executed current URL with realizing by executing following steps The matching operation indicated in filter field includes: the feature string in the feature string and filtered fields judged in current URL Between whether meet matching condition in configuration file, wherein feature string includes at least one of: the chain of current URL It connects, the partial character string in current URL link;Above-mentioned third judgment module is by executing following steps to realize according to execution Judge whether successfully to match current URL with whether the obtained result of operation meets the matching result indicated in filtered fields If including: the matching condition met in configuration file between the character string in the feature string and filtered fields in current URL When, then judge successfully to match current URL.
It is specifically illustrated in conjunction with following example, as shown in connection with fig. 6, it is assumed that current URL is executed in filtered fields and is indicated Matching operation be to be matched to feature string, then judge feature string in current URL whether in filtered fields Feature string meet scheduled matching condition.For example, the feature string in above-mentioned filtered fields is message text " Content ", scheduled matching condition be " substr=, stc=" http://news.sina.com/gj/303/ Data.js " ", then judge whether the feature string (that is, message text " Content ") in current URL meets above-mentioned matching item The concatenation character string of world news in part, such as instruction Sina News shown in fig. 6, if being found in current URL above-mentioned Feature string is then judged successfully to match above-mentioned current URL.
In another example setting canonical matched, using the partial character string in complete character string as feature string, using just Whether then matched mode judges in above-mentioned current URL to include feature string set in canonical matched, with realization pair URL comprising certain specific partial character string is filtered, so that the filtering in the present embodiment to current URL, it can needle URL comprising certain specific a kind of character string is filtered.
The embodiment provided through the invention, by by current URL characteristic parameter and/or feature string and filtering Set characteristic parameter and/or feature string are matched according to scheduled matching way in field, are realized and are accurately sentenced The corresponding URL of spam page in disconnected above-mentioned URL to be detected out, so that the corresponding URL of above-mentioned spam page is carried out accurate mistake Filter, and then Web security sweep is carried out to filtered URL, achieve the effect that the efficiency for improving Web security sweep.
As a kind of optional scheme, preset configuration file is multiple configuration files, wherein filter element 704 includes:
1) searching module, for searching matching configuration file from multiple configuration files, wherein according to matching configuration file In filter identifier judge that current URL is URL to be detected and according to the filtered fields in matching configuration file successfully to working as Preceding URL is matched;
2) filtering module, as long as finding out a matching configuration file from multiple configuration files, to be processed Set of URL filters out current URL in closing.
It is specifically illustrated as shown in connection with fig. 5, by the step S518 in the above process it is found that current URL is (for example, URL_ 3) be currently executing matched configuration file P_2 and match not successfully, then return to search again and judge new configuration file, it is logical Above-mentioned searching module is crossed to search again from multiple configuration files and read the configuration file for being also not carried out matching operation, example Such as, which is configuration file P_3.If current URL (for example, URL_3) and being currently executing matched for configuration text Part P_2 successful match, i.e. configuration file P_2 are the matching configuration file that searching module is found out, then pass through above-mentioned filtering module Step S520 is executed, is that current URL (for example, URL_3) corresponding to the webpage of spam page filters out by above-mentioned be judged out.
As a kind of optional scheme, above-mentioned apparatus further include:
1) scanning element, after executing filter operation for each URL in closing to set of URL to be processed, to by being already expired Filter webpage execution safe web page scan operation to be processed indicated by each URL in the set of URL to be processed conjunction of current URL.
It is specifically illustrated in conjunction with following example, after having filtered out the corresponding URL of spam page, by above-mentioned set of URL Remaining URL is saved after filtering out the corresponding URL of spam page in conjunction, when executing safe web page scan operation, directly Call the above-mentioned URL without spam page saved.
The embodiment provided through the invention, by being closed to the set of URL to be processed for having filtered out the corresponding URL of spam page In each URL indicated by webpage to be processed execute safe web page scan operation, avoid to realize to spam page pair The URL answered executes safe web page scan operation, has achieved the effect that improve Web security sweep.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Embodiment 3
According to embodiments of the present invention, it additionally provides a kind of for implementing the mistake of the uniform resource position mark URL of above-mentioned webpage The server of filter, as shown in figure 8, the server includes:
1) memory 802, are arranged to store and above-mentioned are filtered for executing to the uniform resource position mark URL of webpage Configuration file and complete filtered URL.
Optionally, in the present embodiment, the content stored in above-mentioned memory 802 can also be from except filtering server 102 Except other servers obtain, the present embodiment do not do any restriction to this.
Optionally, in the present embodiment, above-mentioned memory 802 can be also used for filtered in storage above-described embodiment 1 Other data stored in journey.
2) processor 804, each mould being arranged in the filter device to the uniform resource position mark URL of above-mentioned webpage Block executes following operation;
S1 obtains set of URL to be processed and closes, wherein set of URL to be processed closes the URL including multiple webpages to be processed;
S2, to set of URL to be processed close in each URL execute following filter operation, wherein in set of URL conjunction to be processed when The URL of filter operation below preceding execution is current URL:
S20 judges whether current URL is URL to be detected according to the filter identifier in preset configuration file;
S22 matches current URL according to the filtered fields in configuration file if URL is URL to be detected;
S24, if successfully being matched to current URL according to filtered fields, to be processed
Set of URL filters out current URL in closing.
Optionally, in the present embodiment, above-mentioned processor 804 of depositing is also configured to execute following operation to realize according to pre- If configuration file in filter identifier judge whether current URL is URL to be detected:
S1, if filter identifier is to be used to indicate the field being filtered to all webpages, judge current URL be to Detect URL;Or
S2, if filter identifier is to be used to indicate the field being filtered to default domain name, judge in current URL whether Including presetting domain name, if in current URL including default domain name, judge that current URL is URL to be detected.
Optionally, in the present embodiment, above-mentioned processor 804 of depositing is also configured to execute following operation to realize that basis is matched The filtered fields set in file match current URL:
S1 executes the matching operation indicated in filtered fields to current URL;
S2 judges whether according to whether the result that execution matching operation obtains meets the matching result indicated in filtered fields Success matches current URL.
3) communication interface 806 are arranged to carry out data interaction with above-mentioned web page server 104.
Optionally, in the present embodiment, above-mentioned communication interface 806 is also configured to the uniform resource locator with webpage Other servers in addition to above-mentioned web page server 104 carry out data interaction during URL is filtered.
Optionally, the specific example in the present embodiment can be shown with reference to described in above-described embodiment 1 and embodiment 2 Example, details are not described herein for the present embodiment.
Embodiment 4
According to embodiments of the present invention, a kind of storage medium is provided, above-mentioned storage medium can be applied to as shown in Figure 1 In hardware environment.Optionally, above-mentioned storage medium can be, but not limited to be located at and hold for the uniform resource position mark URL to webpage In the filtering server 102 of row filtering.
Optionally, in the present embodiment, above-mentioned storage medium can be applied to the mistake of the uniform resource position mark URL of webpage In filter.
Optionally, in the present embodiment, storage medium is arranged to store the program code for executing following steps:
S1 obtains set of URL to be processed and closes, wherein set of URL to be processed closes the URL including multiple webpages to be processed;
S2, to set of URL to be processed close in each URL execute following filter operation, wherein in set of URL conjunction to be processed when The URL of filter operation below preceding execution is current URL:
S20 judges whether current URL is URL to be detected according to the filter identifier in preset configuration file;
S22 matches current URL according to the filtered fields in configuration file if URL is URL to be detected;
S24 is filtered out current if successfully being matched to current URL according to filtered fields from set of URL to be processed conjunction URL。
Optionally, storage medium is also configured to storage for executing following steps to realize according to preset configuration file In filter identifier judge current URL whether the program code for being URL to be detected:
S1, if filter identifier is to be used to indicate the field being filtered to all webpages, judge current URL be to Detect URL;Or
S2, if filter identifier is to be used to indicate the field being filtered to default domain name, judge in current URL whether Including presetting domain name, if in current URL including default domain name, judge that current URL is URL to be detected.
Optionally, storage medium is also configured to store for executing following steps to realize according to the mistake in configuration file It filters field and matched program code is carried out to current URL:
S1 executes the matching operation indicated in filtered fields to current URL;
S2 judges whether according to whether the result that execution matching operation obtains meets the matching result indicated in filtered fields Success matches current URL.
Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or The various media that can store program code such as CD.
Optionally, the specific example in the present embodiment can be shown with reference to described in above-described embodiment 1 and embodiment 2 Example, details are not described herein for the present embodiment.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
If the integrated unit in above-described embodiment is realized in the form of SFU software functional unit and as independent product When selling or using, it can store in above-mentioned computer-readable storage medium.Based on this understanding, skill of the invention Substantially all or part of the part that contributes to existing technology or the technical solution can be with soft in other words for art scheme The form of part product embodies, which is stored in a storage medium, including some instructions are used so that one Platform or multiple stage computers equipment (can be personal computer, server or network equipment etc.) execute each embodiment institute of the present invention State all or part of the steps of method.
In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed client, it can be by others side Formula is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, and only one Kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or It is desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed it is mutual it Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module It connects, can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (16)

1. a kind of filter method of the uniform resource position mark URL of webpage characterized by comprising
It obtains set of URL to be processed to close, wherein the set of URL to be processed closes the URL including multiple webpages to be processed;
To the set of URL to be processed close in each URL execute following filter operation, wherein in the set of URL conjunction to be processed when The URL of filter operation below preceding execution is current URL:
Judge whether the current URL is URL to be detected according to the filter identifier in preset configuration file;
If the URL is the URL to be detected, the current URL is carried out according to the filtered fields in the configuration file Matching;
If successfully being matched to the current URL according to the filtered fields, filtered out from the set of URL conjunction to be processed The current URL.
2. the method according to claim 1, wherein the filter identifier according in preset configuration file Judge whether the current URL is that URL to be detected includes:
If the filter identifier is to be used to indicate the field being filtered to all webpages, judge that the current URL is The URL to be detected;Or
If the filter identifier is to be used to indicate the field being filtered to default domain name, judge be in the current URL No includes the default domain name, if including the default domain name in the current URL, judges that the current URL is described URL to be detected.
3. the method according to claim 1, wherein the filtered fields according in the configuration file are to institute It states current URL and match and include:
The matching operation indicated in the filtered fields is executed to the current URL;
Whether meeting the matching result judgement indicated in the filtered fields according to the result that the execution matching operation obtains is It is no that successfully the current URL is matched.
4. according to the method described in claim 3, it is characterized in that,
Executing the matching operation indicated in the filtered fields to the current URL includes: the feature judged in the current URL Whether matched in the configuration file is met between characteristic parameter in parameter and the filtered fields, wherein the spy Sign parameter includes: the status code of webpage corresponding to the current URL, and/or for indicating institute corresponding to the current URL State the Content length field of the size of webpage;
Whether meeting the matching result judgement indicated in the filtered fields according to the result that the execution matching operation obtains is If no successfully carry out the characteristic parameter matched include: in the current URL and the spy in the filtered fields to the current URL When meeting the matched in the configuration file between sign parameter, then judge successfully to match the current URL.
5. according to the method described in claim 3, it is characterized in that,
Executing the matching operation indicated in the filtered fields to the current URL includes: the feature judged in the current URL Whether matching condition in the configuration file is met between feature string in character string and the filtered fields, wherein The feature string includes at least one of: the partial character in the link of the current URL, the current URL link String;
Whether meeting the matching result judgement indicated in the filtered fields according to the result that the execution matching operation obtains is If no successfully carry out in the feature string matched include: in the current URL and the filtered fields the current URL When meeting the matching condition in the configuration file between character string, then judge successfully to match the current URL.
6. the method according to claim 1, wherein the configuration file be by include the filter identifier and The file that the json character string of the filtered fields is formed.
7. method according to any one of claim 1 to 6, which is characterized in that the preset configuration file is multiple Configuration file, wherein executed by following steps and worked as described in the filter identifier judgement according in preset configuration file Whether preceding URL be URL to be detected, matched according to the filtered fields in the configuration file to the current URL, from described Set of URL to be processed filters out the current URL in closing:
Matching configuration file is searched from the multiple configuration file, wherein according to the mistake in the matching configuration file Filter identifier judges that the current URL is the URL to be detected and according to the filtering word in the matching configuration file Duan Chenggong matches the current URL;
As long as finding out the matching configuration file from the multiple configuration file, closed from the set of URL to be processed In filter out the current URL.
8. method according to any one of claim 1 to 6, which is characterized in that in being closed to the set of URL to be processed Each URL is executed after the filter operation, further includes:
To the net to be processed indicated by each URL in the set of URL conjunction to be processed as having filtered out the current URL Page executes safe web page scan operation.
9. a kind of filter device of the uniform resource position mark URL of webpage characterized by comprising
Acquiring unit is closed for obtaining set of URL to be processed, wherein it includes multiple webpages to be processed that the set of URL to be processed, which closes, URL;
Filter element executes following filter operation for each URL in closing to the set of URL to be processed, wherein described wait locate Managing and currently executing the URL of following filter operation in set of URL conjunction is current URL:
Judge whether the current URL is URL to be detected according to the filter identifier in preset configuration file;
When the URL is the URL to be detected, the current URL is carried out according to the filtered fields in the configuration file Matching;
When successfully being matched to the current URL according to the filtered fields, filtered out from the set of URL conjunction to be processed The current URL.
10. device according to claim 9, which is characterized in that the filter element includes:
First judgment module, for sentencing when the filter identifier is to be used to indicate the field being filtered to all webpages The disconnected current URL out is the URL to be detected;Or
Second judgment module, for when the filter identifier is to be used to indicate the field being filtered to default domain name, then Judge in the current URL whether to include the default domain name, if including the default domain name in the current URL, judge The current URL is the URL to be detected.
11. device according to claim 9, which is characterized in that the filter element includes:
Matching module, for executing the matching operation indicated in the filtered fields to the current URL;
Third judgment module, whether the result for being obtained according to the execution matching operation, which meets in the filtered fields, indicates Matching result judge whether successfully to match the current URL.
12. device according to claim 11, which is characterized in that
The matching module executes indicated in the filtered fields to the current URL to realize by executing following steps It include: whether to meet between characteristic parameter in the characteristic parameter and the filtered fields judged in the current URL with operation Matched in the configuration file, wherein the characteristic parameter includes: the status code of webpage corresponding to the current URL, And/or the Content length field of the size for indicating the webpage corresponding to the current URL;
The third judgment module is realized by executing following steps according to whether executing result that the matching operation obtains If meeting the matching result indicated in the filtered fields judges whether that it includes: described for successfully carrying out matching to the current URL Meet the matched in the configuration file between the characteristic parameter in characteristic parameter and the filtered fields in current URL When, then judge successfully to match the current URL.
13. device according to claim 11, which is characterized in that
The matching module executes indicated in the filtered fields to the current URL to realize by executing following steps With operation include: between feature string in the feature string and the filtered fields judged in the current URL whether Meet the matching condition in the configuration file, wherein the feature string includes at least one of: the current URL Link, the partial character string in the current URL link;
The third judgment module is realized by executing following steps according to whether executing result that the matching operation obtains If meeting the matching result indicated in the filtered fields judges whether that it includes: described for successfully carrying out matching to the current URL Meet the matching condition in the configuration file between the character string in feature string and the filtered fields in current URL When, then judge successfully to match the current URL.
14. device according to claim 9, which is characterized in that the configuration file is by including the filter identifier The file formed with the json character string of the filtered fields.
15. the device according to any one of claim 9 to 14, which is characterized in that the preset configuration file is more A configuration file, wherein the filter element includes:
Searching module, for searching matching configuration file from the multiple configuration file, wherein text is configured according to the matching The filter identifier in part judges that the current URL is the URL to be detected and according in the matching configuration file The filtered fields successfully the current URL is matched;
Filtering module, as long as finding out the matching configuration file from the multiple configuration file, from described Set of URL to be processed filters out the current URL in closing.
16. the device according to any one of claim 9 to 14, which is characterized in that further include:
Scanning element, after executing the filter operation for each URL in closing to the set of URL to be processed, to by It filters out the webpage to be processed indicated by each URL in the set of URL conjunction to be processed of the current URL and executes webpage Security sweep operation.
CN201410284750.6A 2014-06-23 2014-06-23 The filter method and device of the uniform resource position mark URL of webpage Active CN105302815B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410284750.6A CN105302815B (en) 2014-06-23 2014-06-23 The filter method and device of the uniform resource position mark URL of webpage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410284750.6A CN105302815B (en) 2014-06-23 2014-06-23 The filter method and device of the uniform resource position mark URL of webpage

Publications (2)

Publication Number Publication Date
CN105302815A CN105302815A (en) 2016-02-03
CN105302815B true CN105302815B (en) 2019-06-07

Family

ID=55200091

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410284750.6A Active CN105302815B (en) 2014-06-23 2014-06-23 The filter method and device of the uniform resource position mark URL of webpage

Country Status (1)

Country Link
CN (1) CN105302815B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106227741B (en) * 2016-07-12 2019-08-30 国家计算机网络与信息安全管理中心 A kind of extensive URL matching process based on multilevel hash index chained list
CN106168977B (en) * 2016-07-15 2019-07-02 山谷网安科技股份有限公司 A kind of column recognition methods for web portal security monitoring
CN107066510B (en) * 2017-01-22 2021-12-03 南方科技大学 Information processing method and device
CN108595586B (en) * 2018-04-19 2021-12-24 杭州迪普科技股份有限公司 Method and device for determining search keywords
CN109639686B (en) * 2018-12-17 2022-02-25 江苏满运软件科技有限公司 Distributed webpage filtering method and device, electronic equipment and storage medium
CN111259282B (en) * 2020-02-13 2023-08-29 深圳市腾讯计算机系统有限公司 URL (Uniform resource locator) duplication removing method, device, electronic equipment and computer readable storage medium
CN113411332B (en) * 2021-06-18 2022-10-04 杭州安恒信息技术股份有限公司 CORS vulnerability detection method, device, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1798147A (en) * 2004-12-28 2006-07-05 华为技术有限公司 Method for matching uniform resource locator
CN102110132A (en) * 2010-12-08 2011-06-29 北京星网锐捷网络技术有限公司 Uniform resource locator matching and searching method, device and network equipment
CN102780681A (en) * 2011-05-11 2012-11-14 中兴通讯股份有限公司 URL (Uniform Resource Locator) filtering system and URL filtering method
CN103793462A (en) * 2013-12-02 2014-05-14 北京奇虎科技有限公司 URL (uniform resource locator) purifying method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1798147A (en) * 2004-12-28 2006-07-05 华为技术有限公司 Method for matching uniform resource locator
CN102110132A (en) * 2010-12-08 2011-06-29 北京星网锐捷网络技术有限公司 Uniform resource locator matching and searching method, device and network equipment
CN102780681A (en) * 2011-05-11 2012-11-14 中兴通讯股份有限公司 URL (Uniform Resource Locator) filtering system and URL filtering method
CN103793462A (en) * 2013-12-02 2014-05-14 北京奇虎科技有限公司 URL (uniform resource locator) purifying method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于综合接入设备的防火墙研究与实现;刘涛;《中国优秀硕士学位论文全文数据库》;20090615(第2009年06期);全文

Also Published As

Publication number Publication date
CN105302815A (en) 2016-02-03

Similar Documents

Publication Publication Date Title
CN105302815B (en) The filter method and device of the uniform resource position mark URL of webpage
US9954886B2 (en) Method and apparatus for detecting website security
CN102663000B (en) The maliciously recognition methods of the method for building up of network address database, maliciously network address and device
WO2015085948A1 (en) Method, device, and server for friend recommendation
US20160188723A1 (en) Cloud website recommendation method and system based on terminal access statistics, and related device
US10073886B2 (en) Search results based on a search history
CN103077254B (en) Webpage acquisition methods and device
RU2015156608A (en) NETWORK DEVICE AND SERVICE PROCESS MANAGEMENT METHOD
CN107204956B (en) Website identification method and device
CN106453216A (en) Malicious website interception method, malicious website interception device and client
CN110177114A (en) The recognition methods of network security threats index, unit and computer readable storage medium
CN109325161A (en) Public sentiment data grasping means, device, equipment and storage medium
CN104036003B (en) search result integration method and device
EP2857987A1 (en) Acquiring method, device and system of user behavior
CN102867053A (en) Method, device and system for collecting effective information web pages in website information
CN109981745A (en) A kind of journal file processing method and server
CN103186666A (en) Method, device and equipment for searching based on favorites
CN107832221A (en) Platform semi-automation function test method, apparatus and system based on Burpsuit plug-in units
CN108768982A (en) Detection method, device, computing device and the computer storage media of fishing website
CN104967698B (en) A kind of method and apparatus crawling network data
JP2010049473A (en) Link information extraction device, link information extraction method, and program
CN104954415B (en) Handle the method and device of HTTP request
CN105516114B (en) Method and device for scanning vulnerability based on webpage hash value and electronic equipment
CN111209325A (en) Service system interface identification method, device and storage medium
US11556819B2 (en) Collection apparatus, collection method, and collection program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant