CN103701815A - Webpage scanning processing method, device and client - Google Patents

Webpage scanning processing method, device and client Download PDF

Info

Publication number
CN103701815A
CN103701815A CN201310741650.7A CN201310741650A CN103701815A CN 103701815 A CN103701815 A CN 103701815A CN 201310741650 A CN201310741650 A CN 201310741650A CN 103701815 A CN103701815 A CN 103701815A
Authority
CN
China
Prior art keywords
scanned
url
numerical value
queue
scanning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310741650.7A
Other languages
Chinese (zh)
Inventor
李菲
张龙
杨天池
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NSFOCUS Information Technology Co Ltd
Beijing NSFocus Information Security Technology Co Ltd
Original Assignee
NSFOCUS Information Technology Co Ltd
Beijing NSFocus Information Security Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NSFOCUS Information Technology Co Ltd, Beijing NSFocus Information Security Technology Co Ltd filed Critical NSFOCUS Information Technology Co Ltd
Priority to CN201310741650.7A priority Critical patent/CN103701815A/en
Publication of CN103701815A publication Critical patent/CN103701815A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a webpage scanning method, a webpage scanning device and a client. The method comprises the steps of controlling the quantity of to-be-scanned URLs (Uniform Resource Locator) in a to-be-scanned URL queen to be maximum concurrent numerical value by the client; scanning aiming at each to-be-scanned URL in the to-be-scanned URL queen; distributing a scanning rule to the current to-be-scanned URL according to the stored characteristic information of a website corresponding to the current to-be-scanned URL; transmitting a visit request to the website corresponding to the to-be-scanned URL, after a visit response returned by the website corresponding to the to-be-scanned URL is received, using the scanning rule distributed to the current to-be-scanned URL to scan the visit response returned by the website corresponding to the current to-be-scanned URL and after the scanning result of the current to-be-scanned URL is obtained, deleting the current to-be-scanned URL from the to-be-scanned URL queen. According to the method, the processing efficiency and stability can be improved.

Description

A kind of webpage scanning processing method, device and client
Technical field
The present invention relates to communication technical field, espespecially a kind of webpage (Web) scanning processing method, device and client.
Background technology
Web scanning can initiatively be applied and carry out leak excavation and detection web, helps client to promote defence capability, eliminates safe hidden trouble.At present, the processing method of Web scanning is: reptile module is by the URL(uniform resource locator) crawling (universal resource locator, URL) pass to scan module, scan module sends request to server for each URL one by one, and the response of then using existing scanning rule to return server scans and provide scanning result.
When the quantity of need URL to be processed is a lot, the method efficiency is very low, once and occur that in the process of scanning local error also likely causes whole system collapse.Therefore, existing webpage scanning processing method efficiency is very low, and stability is very poor.
Summary of the invention
The embodiment of the present invention provides a kind of webpage scanning processing method, device and client, very low in order to solve existing webpage scanning processing method efficiency, the problem that stability is very poor.
Therefore, according to the embodiment of the present invention, provide a kind of webpage scanning processing method, comprising:
The quantity that client is controlled the URL to be scanned in uniform resource position mark URL queue to be scanned is maximum concurrent numerical value; And
For each URL to be scanned in described URL queue to be scanned, carry out:
According to the characteristic information of website corresponding to the URL current to be scanned having stored, for described current URL to be scanned distributes scanning rule;
The website corresponding to described current URL to be scanned sends access request, receive after the access response of returning website corresponding to described current URL to be scanned, use the scanning rule distributing for described current URL to be scanned to scan the access response that website corresponding to described current URL to be scanned returned;
Obtain after the scanning result for described current URL to be scanned, from described URL queue to be scanned, delete described current URL to be scanned.
Concrete, the quantity that controls the URL to be scanned in URL queue to be scanned is maximum concurrent numerical value, specifically comprises:
Monitor the quantity of the URL to be scanned in described URL queue to be scanned;
If the quantity of the URL to be scanned in described URL queue to be scanned is less than the concurrent numerical value of described maximum, from URL to be scanned storehouse, obtain URL and add in described URL queue to be scanned, until the quantity of the URL to be scanned in described URL queue to be scanned is the concurrent numerical value of described maximum.
Optionally, also comprise:
Monitor the duration that each URL to be scanned in described URL queue to be scanned is arranged in described URL queue to be scanned and whether surpass the scan period;
Deletion is arranged in the to be scanned URL of the duration of described URL queue to be scanned over the described scan period.
Optionally, also comprise:
If the scanning result for described current URL to be scanned obtaining comprises the characteristic information of the website that the described current URL to be scanned of not storage is corresponding, store the characteristic information that comprises the website that described current URL to be scanned is corresponding of not storing in the scanning result of described current URL to be scanned.
Optionally, also comprise:
If the scanning of the access response of returning for a website corresponding to URL to be scanned occurs abnormal, the scanning of the access response that end is returned for a described website corresponding to URL to be scanned, and from described URL queue to be scanned, delete a described URL to be scanned.
Optionally, also comprise:
Dynamically adjust described maximum concurrent number.
Concrete, dynamically adjust the concurrent numerical value of described maximum, specifically comprise:
According to the performance variable value and corresponding concurrent numerical value of client described at least two groups of storage, determine weight factor;
According to the performance variable value in client described in current time and the weight factor of determining, determine the concurrent numerical value of maximum undetermined of described current time;
If the difference of the concurrent numerical value of the concurrent numerical value of described maximum undetermined and storage is less than or equal to setting threshold, the concurrent numerical value of described maximum is adjusted into the concurrent numerical value of described maximum undetermined, and stores performance variable value and the concurrent numerical value of described maximum undetermined of client described in described current time.
A kind of webpage scaning treatment device is also provided, comprises:
Control unit is maximum concurrent numerical value for controlling the quantity of the URL to be scanned of uniform resource position mark URL queue to be scanned;
Allocation units, carry out for each URL to be scanned for described URL queue to be scanned: according to the characteristic information of website corresponding to the URL current to be scanned having stored, for described current URL to be scanned distributes scanning rule;
Scanning element, for sending access request to website corresponding to described current URL to be scanned, receive after the access response of returning website corresponding to described current URL to be scanned, use the scanning rule distributing for described current URL to be scanned to scan the access response that website corresponding to described current URL to be scanned returned; Obtain after the scanning result for described current URL to be scanned, notify described control unit from described URL queue to be scanned, to delete described current URL to be scanned.
Concrete, described control unit, specifically for:
Monitor the quantity of the URL to be scanned in described URL queue to be scanned;
If the quantity of the URL to be scanned in described URL queue to be scanned is less than the concurrent numerical value of described maximum, from URL to be scanned storehouse, obtain URL and add in described URL queue to be scanned, until the quantity of the URL to be scanned in described URL queue to be scanned is the concurrent numerical value of described maximum.
Optionally, described control unit, also for:
Monitor the duration that each URL to be scanned in described URL queue to be scanned is arranged in described URL queue to be scanned and whether surpass the scan period;
Deletion is arranged in the to be scanned URL of the duration of described URL queue to be scanned over the described scan period.
Optionally, described scanning element, also for:
If the scanning result for described current URL to be scanned obtaining comprises not the characteristic information of the website that the described current URL to be scanned of storage is corresponding, the scanning result of storing described current URL to be scanned comprises the characteristic information of the website that the described current URL to be scanned of not storage is corresponding.
Optionally, also for:
If the scanning of the access response of returning for a website corresponding to URL to be scanned occurs abnormal, the scanning of the access response that end is returned for a described website corresponding to URL to be scanned, and from described URL queue to be scanned, delete a described URL to be scanned.
Optionally, also for:
Dynamically adjust described maximum concurrent number.
Concrete, described control unit, for dynamically adjusting the concurrent numerical value of described maximum, specifically for:
According to the performance variable value of the client at least two groups self place of storage and corresponding concurrent numerical value, determine weight factor;
According to the performance variable value in client described in current time and the weight factor of determining, determine the concurrent numerical value of maximum undetermined of described current time;
If the difference of the concurrent numerical value of the concurrent numerical value of described maximum undetermined and storage is less than or equal to setting threshold, the concurrent numerical value of described maximum is adjusted into the concurrent numerical value of described maximum undetermined, and stores performance variable value and the concurrent numerical value of described maximum undetermined of client described in described current time.
A kind of client is also provided, comprises above-mentioned webpage scaning treatment device.
Webpage scanning processing method, device and client that the embodiment of the present invention provides, the quantity that can control the URL to be scanned in URL queue to be scanned is maximum concurrent numerical value, the quantity that so just can realize simultaneously treated URL to be scanned is maximum concurrent numerical value, thereby can improve treatment effeciency; And not to use all scanning rules to scan each URL to be scanned, but distribute scanning rule for each URL to be scanned, can save the time of scanning yet, thereby improve treatment effeciency; Because the scanning process for each URL to be scanned is independently, even if occur extremely, also can not affecting the performance of client in scanning process, stability is higher.
Accompanying drawing explanation
Fig. 1 is the flow chart of webpage scanning processing method in the embodiment of the present invention;
Fig. 2 is the structural representation of webpage scaning treatment device in the embodiment of the present invention.
Embodiment
Very low for existing webpage scanning processing method efficiency, the problem that stability is very poor, the embodiment of the present invention provides a kind of webpage scanning processing method, and the executive agent of the method is client, and flow process as shown in Figure 1, performs step as follows:
S10: the quantity that controls the URL to be scanned in URL queue to be scanned is maximum concurrent numerical value.
User is after URL corresponding to the website of client input wish access, client sends access request to this website, receive after the access response of returning this website, reptile module can crawl URL from this access response, then the URL crawling is joined in URL to be scanned storehouse, client is obtained URL and is joined in URL queue to be scanned from URL to be scanned storehouse, and the quantity that controls the URL to be scanned in URL queue to be scanned is maximum concurrent numerical value.
Wherein, the concurrent numerical value of this maximum can be predefined numerical value, can be also the numerical value of adjusting in real time according to the performance variable of client.
S11: carry out for each URL to be scanned in URL queue to be scanned: according to the characteristic information of website corresponding to the URL current to be scanned having stored, for current URL to be scanned distributes scanning rule.
URL to be scanned carries out before scanning to each, can distribute scanning rule for it.Because the quantity of current leak is more and more, corresponding scanning rule is also more and more, if use all scanning rules to scan for each URL to be scanned, can waste a lot of time.Scanning rule can be divided into several grades, for example can be divided into these four grades of website, catalogue, file and URL, the URL of corresponding and same website, the information that this website is returned is all identical, therefore, if this information is scanned, follow-up just without having carried out multiple scanning, when distributing scanning rule, just, without site-level other scanning rule of reallocation, the distribution of other rank scanning rule is also same.
For example, if stored server version information, location etc. the information of website, to follow-up URL to be scanned just without the scanning rule of this class of reallocation.
S12: the website corresponding to current URL to be scanned sends access request, receive after the access response of returning website corresponding to current URL to be scanned, use the scanning rule distributing for current URL to be scanned to scan the access response that website corresponding to current URL to be scanned returned.
S13: obtain after the scanning result for current URL to be scanned, delete current URL to be scanned from URL queue to be scanned.
So just new URL to be scanned can be joined in URL queue to be scanned.
In this scheme, the quantity that can control the URL to be scanned in URL queue to be scanned is maximum concurrent numerical value, and the quantity that so just can realize simultaneously treated URL to be scanned is maximum concurrent numerical value, thereby can improve treatment effeciency; And not to use all scanning rules to scan each URL to be scanned, but distribute scanning rule for each URL to be scanned, can save the time of scanning yet, thereby improve treatment effeciency; Because the scanning process for each URL to be scanned is independently, even if occur extremely, also can not affecting the performance of client in scanning process, stability is higher.
Concrete, the quantity of the URL to be scanned in the control URL queue to be scanned in above-mentioned S10 is maximum concurrent numerical value, specifically comprises:
Monitor the quantity of the URL to be scanned in URL queue to be scanned;
If the quantity of the URL to be scanned in URL queue to be scanned is less than maximum concurrent numerical value, from URL to be scanned storehouse, obtains URL and add in URL queue to be scanned, until the quantity of the URL to be scanned in URL queue to be scanned is maximum concurrent numerical value.
Can monitor in real time the quantity of the URL to be scanned in URL queue to be scanned, when quantity is less than maximum concurrent numerical value, from URL to be scanned storehouse, obtain URL in time, the quantity that guarantees the URL to be scanned in URL queue to be scanned is maximum concurrent numerical value, namely guarantee that the resource of client is used fully, can not waste the resource of client.
Optionally, above-mentioned Web scanning processing method, also comprises:
Monitor the duration that each URL to be scanned in URL queue to be scanned is arranged in URL queue to be scanned and whether surpass the scan period;
Deletion is arranged in the to be scanned URL of the duration of URL queue to be scanned over the scan period.
Can set the scan period, if the URL to be scanned in URL queue to be scanned is arranged in the words that the duration of URL queue to be scanned surpasses this scan period, directly delete this URL to be scanned, prevent that URL to be scanned from taking the resource of client always, the resource of waste client.
Optionally, above-mentioned Web scanning processing method, also comprise: if the scanning result for current URL to be scanned obtaining comprises not the characteristic information of the website that the URL current to be scanned of storage is corresponding, the scanning result of storing current URL to be scanned comprises the characteristic information of the website that the URL current to be scanned of not storage is corresponding.
So just can no longer carry out multiple scanning for the information of having obtained, save the processing time, improve treatment effeciency.
Optionally, above-mentioned Web scanning processing method, also comprises:
If the scanning of the access response of returning for a website corresponding to URL to be scanned occurs abnormal, the scanning of the access response that end is returned for a website corresponding to URL to be scanned, and from URL queue to be scanned, delete a URL to be scanned.
So just new URL can be joined in URL queue to be scanned, thereby assurance makes full use of the resource of client, has guaranteed treatment effeciency.
Optionally, above-mentioned Web scanning processing method, also comprises: dynamically adjust maximum concurrent number.
Concrete, dynamically adjust maximum concurrent numerical value, specifically comprise:
According to the performance variable value of at least two group clients of storage and corresponding concurrent numerical value, determine weight factor;
According to the performance variable value in current time client and the weight factor of determining, determine the concurrent numerical value of maximum undetermined of current time;
If the difference of the concurrent numerical value of the concurrent numerical value of maximum undetermined and storage is less than or equal to setting threshold, the concurrent numerical value of maximum is adjusted into the concurrent numerical value of maximum undetermined, and stores performance variable value and the concurrent numerical value of maximum undetermined of current time client.
Realize the maximum utilization of the resource of client, need to the concurrent numerical value of maximum, carry out dynamic regulation according to the performance variable value of client.
The performance variable of client comprises that central processing unit (Central Processing Unit, CPU) occupancy, memory size, swapace size, process are total etc., and these variable actings in conjunction affect the maximum of concurrent number sometime.
The performance variable x of maximum concurrent number Y and client nbetween pass be:
Y=β 01x 12x 2+...+β nx n+ε (1)
Wherein, x nfor the performance variable of client, β nfor weight factor, ε is the error of estimated value and actual measured value, and the scope of ε is ε~N (0, σ 2), this scope is empirical value.
At [t 0, t p] in this section of duration, stored many groups variable x nwith y n, be designated as X and Y, wherein:
Y = y 1 y 2 · · · y n , X = x 11 x 12 · · · x 1 m x 21 x 22 · · · x 2 m · · · · · · · · · · · · x n 1 x n 2 · · · x nm ;
Weight factor β ncan be expressed as B:B=(X ' X) -1x ' Y, the transposed matrix that X ' is X.
By the performance variable value of current time client and the β calculating nbring the concurrent numerical value y ' of maximum undetermined that calculates current time in formula (1) into n, calculate y nwith y ' ndifference, if difference is less than or equal to ε, by y ' nas the concurrent numerical value of maximum, can realize so the maximum concurrent number of dynamic adjustment, guarantee that the resource of client maximizes the use.
Based on same inventive concept, the embodiment of the present invention provides a kind of webpage scaning treatment device, and this device can be arranged in client, and the structure of this device as shown in Figure 2, comprising:
Control unit 20 is maximum concurrent numerical value for controlling the quantity of the URL to be scanned of URL queue to be scanned.
Allocation units 21, carry out for each URL to be scanned for URL queue to be scanned: according to the characteristic information of website corresponding to the URL current to be scanned having stored, for current URL to be scanned distributes scanning rule.
Scanning element 22, for sending access request to website corresponding to current URL to be scanned, receive after the access response of returning website corresponding to current URL to be scanned, use the scanning rule distributing for current URL to be scanned to scan the access response that website corresponding to current URL to be scanned returned; Obtain after the scanning result for current URL to be scanned, notice control unit 20 is deleted current URL to be scanned from URL queue to be scanned.
Concrete, control unit 20, specifically for:
Monitor the quantity of the URL to be scanned in URL queue to be scanned;
If the quantity of the URL to be scanned in URL queue to be scanned is less than maximum concurrent numerical value, from URL to be scanned storehouse, obtains URL and add in URL queue to be scanned, until the quantity of the URL to be scanned in URL queue to be scanned is maximum concurrent numerical value.
Optionally, control unit 20, also for:
Monitor the duration that each URL to be scanned in URL queue to be scanned is arranged in URL queue to be scanned and whether surpass the scan period;
Deletion is arranged in the to be scanned URL of the duration of URL queue to be scanned over the scan period.
Optionally, scanning element 22, also for:
If the scanning result for current URL to be scanned obtaining comprises not the characteristic information of the website that the URL current to be scanned of storage is corresponding, the scanning result of storing current URL to be scanned comprises the characteristic information of the website that the URL current to be scanned of not storage is corresponding.
Optionally, scanning element 22, also for:
If the scanning of the access response of returning for a website corresponding to URL to be scanned occurs abnormal, the scanning of the access response that end is returned for a website corresponding to URL to be scanned, and from URL queue to be scanned, delete a URL to be scanned.
Optionally, control unit 20, also for:
Dynamically adjust maximum concurrent number.
Concrete, control unit 20, for dynamically adjusting maximum concurrent numerical value, specifically for:
According to the performance variable value of the client at least two groups self place of storage and corresponding concurrent numerical value, determine weight factor;
According to the performance variable value in current time client and the weight factor of determining, determine the concurrent numerical value of maximum undetermined of current time;
If the difference of the concurrent numerical value of the concurrent numerical value of maximum undetermined and storage is less than or equal to setting threshold, the concurrent numerical value of maximum is adjusted into the concurrent numerical value of maximum undetermined, and stores performance variable value and the concurrent numerical value of maximum undetermined of current time client.
The present invention is with reference to describing according to flow chart and/or the block diagram of the method for the embodiment of the present invention, equipment (system) and computer program.Should understand can be in computer program instructions realization flow figure and/or block diagram each flow process and/or the flow process in square frame and flow chart and/or block diagram and/or the combination of square frame.Can provide these computer program instructions to the processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing device to produce a machine, the instruction of carrying out by the processor of computer or other programmable data processing device is produced for realizing the device in the function of flow process of flow chart or a plurality of flow process and/or square frame of block diagram or a plurality of square frame appointments.
These computer program instructions also can be stored in energy vectoring computer or the computer-readable memory of other programmable data processing device with ad hoc fashion work, the instruction that makes to be stored in this computer-readable memory produces the manufacture that comprises command device, and this command device is realized the function of appointment in flow process of flow chart or a plurality of flow process and/or square frame of block diagram or a plurality of square frame.
These computer program instructions also can be loaded in computer or other programmable data processing device, make to carry out sequence of operations step to produce computer implemented processing on computer or other programmable devices, thereby the instruction of carrying out is provided for realizing the step of the function of appointment in flow process of flow chart or a plurality of flow process and/or square frame of block diagram or a plurality of square frame on computer or other programmable devices.
Although described optional embodiment of the present invention, once those skilled in the art obtain the basic creative concept of cicada, can make other change and modification to these embodiment.So claims are intended to be interpreted as all changes and the modification that comprise optional embodiment and fall into the scope of the invention.
Obviously, those skilled in the art can carry out various changes and modification and not depart from the spirit and scope of the embodiment of the present invention the embodiment of the present invention.Like this, if within these of the embodiment of the present invention are revised and modification belongs to the scope of the claims in the present invention and equivalent technologies thereof, the present invention is also intended to comprise these changes and modification interior.

Claims (15)

1. a webpage scanning processing method, is characterized in that, comprising:
The quantity that client is controlled the URL to be scanned in uniform resource position mark URL queue to be scanned is maximum concurrent numerical value; And
For each URL to be scanned in described URL queue to be scanned, carry out:
According to the characteristic information of website corresponding to the URL current to be scanned having stored, for described current URL to be scanned distributes scanning rule;
The website corresponding to described current URL to be scanned sends access request, receive after the access response of returning website corresponding to described current URL to be scanned, use the scanning rule distributing for described current URL to be scanned to scan the access response that website corresponding to described current URL to be scanned returned;
Obtain after the scanning result for described current URL to be scanned, from described URL queue to be scanned, delete described current URL to be scanned.
2. the method for claim 1, is characterized in that, the quantity that controls the URL to be scanned in URL queue to be scanned is maximum concurrent numerical value, specifically comprises:
Monitor the quantity of the URL to be scanned in described URL queue to be scanned;
If the quantity of the URL to be scanned in described URL queue to be scanned is less than the concurrent numerical value of described maximum, from URL to be scanned storehouse, obtain URL and add in described URL queue to be scanned, until the quantity of the URL to be scanned in described URL queue to be scanned is the concurrent numerical value of described maximum.
3. the method for claim 1, is characterized in that, also comprises:
Monitor the duration that each URL to be scanned in described URL queue to be scanned is arranged in described URL queue to be scanned and whether surpass the scan period;
Deletion is arranged in the to be scanned URL of the duration of described URL queue to be scanned over the described scan period.
4. the method for claim 1, is characterized in that, also comprises:
If the scanning result for described current URL to be scanned obtaining comprises the characteristic information of the website that the described current URL to be scanned of not storage is corresponding, store the characteristic information that comprises the website that described current URL to be scanned is corresponding of not storing in the scanning result of described current URL to be scanned.
5. the method for claim 1, is characterized in that, also comprises:
If the scanning of the access response of returning for a website corresponding to URL to be scanned occurs abnormal, the scanning of the access response that end is returned for a described website corresponding to URL to be scanned, and from described URL queue to be scanned, delete a described URL to be scanned.
6. the method for claim 1, is characterized in that, also comprises:
Dynamically adjust described maximum concurrent number.
7. method as claimed in claim 6, is characterized in that, dynamically adjusts the concurrent numerical value of described maximum, specifically comprises:
According to the performance variable value and corresponding concurrent numerical value of client described at least two groups of storage, determine weight factor;
According to the performance variable value in client described in current time and the weight factor of determining, determine the concurrent numerical value of maximum undetermined of described current time;
If the difference of the concurrent numerical value of the concurrent numerical value of described maximum undetermined and storage is less than or equal to setting threshold, the concurrent numerical value of described maximum is adjusted into the concurrent numerical value of described maximum undetermined, and stores performance variable value and the concurrent numerical value of described maximum undetermined of client described in described current time.
8. a webpage scaning treatment device, is characterized in that, comprising:
Control unit is maximum concurrent numerical value for controlling the quantity of the URL to be scanned of uniform resource position mark URL queue to be scanned;
Allocation units, carry out for each URL to be scanned for described URL queue to be scanned: according to the characteristic information of website corresponding to the URL current to be scanned having stored, for described current URL to be scanned distributes scanning rule;
Scanning element, for sending access request to website corresponding to described current URL to be scanned, receive after the access response of returning website corresponding to described current URL to be scanned, use the scanning rule distributing for described current URL to be scanned to scan the access response that website corresponding to described current URL to be scanned returned; Obtain after the scanning result for described current URL to be scanned, notify described control unit from described URL queue to be scanned, to delete described current URL to be scanned.
9. device as claimed in claim 8, is characterized in that, described control unit, specifically for:
Monitor the quantity of the URL to be scanned in described URL queue to be scanned;
If the quantity of the URL to be scanned in described URL queue to be scanned is less than the concurrent numerical value of described maximum, from URL to be scanned storehouse, obtain URL and add in described URL queue to be scanned, until the quantity of the URL to be scanned in described URL queue to be scanned is the concurrent numerical value of described maximum.
10. device as claimed in claim 8, is characterized in that, described control unit, also for:
Monitor the duration that each URL to be scanned in described URL queue to be scanned is arranged in described URL queue to be scanned and whether surpass the scan period;
Deletion is arranged in the to be scanned URL of the duration of described URL queue to be scanned over the described scan period.
11. devices as claimed in claim 8, is characterized in that, described scanning element, also for:
If the scanning result for described current URL to be scanned obtaining comprises not the characteristic information of the website that the described current URL to be scanned of storage is corresponding, the scanning result of storing described current URL to be scanned comprises the characteristic information of the website that the described current URL to be scanned of not storage is corresponding.
12. devices as claimed in claim 8, is characterized in that, described scanning element, also for:
If the scanning of the access response of returning for a website corresponding to URL to be scanned occurs abnormal, the scanning of the access response that end is returned for a described website corresponding to URL to be scanned, and from described URL queue to be scanned, delete a described URL to be scanned.
13. devices as claimed in claim 8, is characterized in that, described control unit, also for:
Dynamically adjust described maximum concurrent number.
14. devices as claimed in claim 13, is characterized in that, described control unit, and for dynamically adjusting the concurrent numerical value of described maximum, specifically for:
According to the performance variable value of the client at least two groups self place of storage and corresponding concurrent numerical value, determine weight factor;
According to the performance variable value in client described in current time and the weight factor of determining, determine the concurrent numerical value of maximum undetermined of described current time;
If the difference of the concurrent numerical value of the concurrent numerical value of described maximum undetermined and storage is less than or equal to setting threshold, the concurrent numerical value of described maximum is adjusted into the concurrent numerical value of described maximum undetermined, and stores performance variable value and the concurrent numerical value of described maximum undetermined of client described in described current time.
15. 1 kinds of clients, is characterized in that, comprise the webpage scaning treatment device as described in as arbitrary in claim 8-14.
CN201310741650.7A 2013-12-27 2013-12-27 Webpage scanning processing method, device and client Pending CN103701815A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310741650.7A CN103701815A (en) 2013-12-27 2013-12-27 Webpage scanning processing method, device and client

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310741650.7A CN103701815A (en) 2013-12-27 2013-12-27 Webpage scanning processing method, device and client

Publications (1)

Publication Number Publication Date
CN103701815A true CN103701815A (en) 2014-04-02

Family

ID=50363211

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310741650.7A Pending CN103701815A (en) 2013-12-27 2013-12-27 Webpage scanning processing method, device and client

Country Status (1)

Country Link
CN (1) CN103701815A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107172036A (en) * 2017-05-11 2017-09-15 北京安赛创想科技有限公司 A kind of network sweep control method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1617499A (en) * 2003-11-12 2005-05-18 国际商业机器公司 Method of processing a request for a plurality of web services, server and system
US20090003250A1 (en) * 2007-06-29 2009-01-01 Kabushiki Kaisha Toshiba Wireless Communication Device, Wireless Communication System and Network Control Method
CN101888312A (en) * 2009-05-15 2010-11-17 北京启明星辰信息技术股份有限公司 Attack detection and response method and device of WEB page
CN102789502A (en) * 2012-07-17 2012-11-21 北京奇虎科技有限公司 Method and device for scanning website
CN102932370A (en) * 2012-11-20 2013-02-13 华为技术有限公司 Safety scanning method, equipment and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1617499A (en) * 2003-11-12 2005-05-18 国际商业机器公司 Method of processing a request for a plurality of web services, server and system
US20090003250A1 (en) * 2007-06-29 2009-01-01 Kabushiki Kaisha Toshiba Wireless Communication Device, Wireless Communication System and Network Control Method
CN101888312A (en) * 2009-05-15 2010-11-17 北京启明星辰信息技术股份有限公司 Attack detection and response method and device of WEB page
CN102789502A (en) * 2012-07-17 2012-11-21 北京奇虎科技有限公司 Method and device for scanning website
CN102932370A (en) * 2012-11-20 2013-02-13 华为技术有限公司 Safety scanning method, equipment and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107172036A (en) * 2017-05-11 2017-09-15 北京安赛创想科技有限公司 A kind of network sweep control method and device

Similar Documents

Publication Publication Date Title
US9043383B2 (en) Stream processing using a client-server architecture
CN102833298A (en) Distributed repeated data deleting system and processing method thereof
CN102857578A (en) File uploading method and file uploading system of network drive and network drive client
CN103561049A (en) Method for processing terminal scheduling request, system thereof and device thereof
CN104363282B (en) A kind of cloud computing resource scheduling method and device
CN104346345A (en) Data storage method and device
CN102394880A (en) Method and device for processing jump response in content delivery network
CN104683408A (en) Method and system for OpenStack cloud computing management platform to build virtual machine instance
CN103870591A (en) Method and system for carrying out parallel spatial analysis service based on spatial data
KR102486704B1 (en) Client and server in supervisory control and data acquisition system
CN103986783A (en) Cloud computing system
CN104184765A (en) Request control method, client apparatus and server-side apparatus
CN109861922B (en) Method and apparatus for controlling flow
CN107846322A (en) A kind of monitoring system of self-service device
CN103701815A (en) Webpage scanning processing method, device and client
CN103763133B (en) Method, equipment and system for realizing access control
CN105357317B (en) A kind of data uploading method and system based on multi-client repeating query queuing
CN106844420A (en) Based on user packet method and device that social networks and big data are analyzed
CN105917694B (en) Service in telecommunication network provides and activation
Zaharia et al. Fast and optimal scheduling over multiple network interfaces
CN108124021A (en) Internet protocol IP address obtains, the method, apparatus and system of website visiting
Bogoiavlenskaia et al. Individual client strategies for active control of information-driven service construction in IoT-enabled smart spaces
CN109150988A (en) A kind of request processing method and its server
CN104168274A (en) Data obtaining request processing method, client sides and server
AbdulAzeem et al. A framework for ranking uncertain distributed database

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140402