CN102253991B - Uniform resource locator (URL) storage method, web filtering method, device and system - Google Patents

Uniform resource locator (URL) storage method, web filtering method, device and system Download PDF

Info

Publication number
CN102253991B
CN102253991B CN201110187962.9A CN201110187962A CN102253991B CN 102253991 B CN102253991 B CN 102253991B CN 201110187962 A CN201110187962 A CN 201110187962A CN 102253991 B CN102253991 B CN 102253991B
Authority
CN
China
Prior art keywords
url
bloom filter
deletion
memory device
delta package
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110187962.9A
Other languages
Chinese (zh)
Other versions
CN102253991A (en
Inventor
王祖海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Star Net Ruijie Networks Co Ltd
Ruijie Networks Co Ltd
Original Assignee
Beijing Star Net Ruijie Networks Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Star Net Ruijie Networks Co Ltd filed Critical Beijing Star Net Ruijie Networks Co Ltd
Priority to CN201110187962.9A priority Critical patent/CN102253991B/en
Publication of CN102253991A publication Critical patent/CN102253991A/en
Application granted granted Critical
Publication of CN102253991B publication Critical patent/CN102253991B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a uniform resource locator (URL) storage method, a web filtering method, a web filtering device and a web filtering system. The URL storage method comprises the following steps of: S11, classifying URL according to a predetermined classification rule; S12, respectively generating bloom filters for storing various types of URLs; and S13, storing the URL in the corresponding bloom filter according to the types of the URLs. By the URL storage method and the web filtering method, the web filtering device and the web filtering system, efficient URL query can be provided while a web is filtered, so that the network performance is improved.

Description

URL storage means, Webpage filtering method, Apparatus and system
Technical field
The present invention relates to communication technical field, relate in particular to URL storage means, Webpage filtering method, Apparatus and system.
Background technology
Along with the development of network technology and resource, the business that network carries and function be variation further, enterprises is in order to improve security and employee's access to netwoks to be limited, conventionally at enterprise network export deployment URL(uniform resource locator) (URL) filtering gateway.Wherein, URL is a kind of identification method for the address of the upper webpage in complete description the Internet (Internet) and other resources, each webpage on Internet has the URL address of a unique correspondence, this URL address can be that a certain computing machine in local disk, LAN (Local Area Network) can be also the website on Internet, and URL address is alleged network address conventionally.The URL that this url filtering gateway can be accessed user audits, and judges that whether it is legal, and stops this access while being illegal when judged result.
Url filtering gateway judges whether the URL of user access is legal need be based on a URL storehouse, be that url filtering gateway obtains after the URL of user's access, url field in inquiry URL storehouse is with the match hit record corresponding with this URL, and further inquire about the corresponding sorting field of this url field, knowing that this URL is corresponding is categorized as Lawful access or unauthorized access, thereby carries out respective handling.Because url filtering gateway is outlet gateway, so in the time that number of users is many, a large amount of URL inquiries will greatly affect network performance.
Summary of the invention
The invention provides a kind of URL storage means, Webpage filtering method, Apparatus and system, high efficiency URL inquiry can be provided in the time carrying out home page filter, thereby improve network performance.
The invention provides a kind of URL storage means, comprising:
Step S11, classifies to URL according to predtermined category rule;
Step S12, generates respectively the Bloom filter for storing all types of URL;
Step S13, according to the type of each URL, is stored in described URL in corresponding described Bloom filter.
According to a further aspect in the invention, also provide a kind of URL memory device, comprising:
Sort module, for classifying to URL according to predtermined category rule;
Generation module, for generating respectively the Bloom filter for storing all types of URL;
Memory module, for according to the type of each URL, is stored in described URL in corresponding described Bloom filter.
According to another aspect of the invention, also provide a kind of Webpage filtering method, comprising:
Step S21, filtering gateway equipment obtains to classify from URL memory device provided by the invention and stores the Bloom filter of URL;
Step S22, described filtering gateway equipment filters webpage according to the type of the URL storing in described Bloom filter and described URL.
According to another aspect of the invention, also provide a kind of filtering gateway equipment, it is characterized in that, comprising:
Acquisition module, stores the Bloom filter of URL for obtaining from URL memory device to classify;
Filtering module, filters webpage for the URL that stores according to described Bloom filter and the type of described URL.
In accordance with a further aspect of the present invention, also provide a kind of webpage filter system, comprise URL memory device provided by the invention and filtering gateway equipment provided by the invention.
According to URL storage means of the present invention, Webpage filtering method, Apparatus and system, by creating the Bloom filter corresponding to all kinds of URL, and URL is stored in corresponding Bloom filter by type, so when the URL accessing after filtering gateway equipment obtains the Bloom filter that stores URL, to user audits, without all travel through the huge URL storehouse of amount of memory for each URL to be audited, greatly improve search efficiency, even if therefore have a large number of users to initiate access to netwoks, also can guarantee the performance of network simultaneously.And, also save a large amount of storage spaces.In addition, employing can also realize highly confidential to URL without carry out extra encryption in the situation that.
Brief description of the drawings
Fig. 1 utilizes URL to carry out the system architecture diagram of home page filter.
Fig. 2 is the schematic flow sheet of URL storage means of the present invention.
Fig. 3 is the structural representation of URL memory device of the present invention.
Fig. 4 is the schematic flow sheet of Webpage filtering method of the present invention.
Fig. 5 is the structural representation of filtering gateway equipment of the present invention.
Embodiment
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing, technical scheme of the present invention is clearly and completely described.
Fig. 1 utilizes URL to carry out the system architecture diagram of home page filter.As shown in Figure 1, comprise the URL server for generating URL storehouse and obtain URL storehouse to carry out the filtering gateway equipment of home page filter by Internet from URL server.The URL storage means of following embodiment is the performed operation of URL server.
Fig. 2 is the schematic flow sheet of URL storage means of the present invention.As shown in Figure 2, this URL storage means comprises the following steps:
Step S11, classifies to URL according to predtermined category rule;
The URL type of dividing particularly, for example comprises Lawful access, unauthorized access etc.In this step, set after type, whole type name is stored in a management document, and according to user's request, the whole URL in URL storehouse is divided according to the type of setting.
Step S12, generates respectively the Bloom filter for storing all types of URL;
Above-mentioned steps can specifically comprise:
Step S121, the quantity of adding up all types of URL;
Step S122, according to the quantity of described URL and the shared byte number of each URL, application memory headroom;
Step S123, determines the hash function of described Bloom filter according to the highest false percent of pass of the quantity of described URL and setting.
Wherein, Bloom filter is proposed by Ba Dunbulong in one nine seven zero years, and its principle is as follows: a Bloom filter is by k separate hash function h1, h2 ..., hk and a bit vector composition that length is m, wherein, the codomain of each hash function be 0,1 ..., m-1}, because a byte has 8 bits, therefore the manual memory headroom of bit vector is m/8 byte again, and all positions of bit vector are all initialized as 0.S set={ s1, s2, ..., sn}, with the hash sequence (h1 (s) of each element calculating in k hash function pair set S, h2 (s), ..., hk (s)), then hash sequence bit corresponding in bit vector is made as to 1, the data element set S that claimed this Bloom filter device, this Bloom filter has represented data element set S in other words.If for example h1 (s1)=5, is made as 1 by the 6th of bit vector, if h2 (s1)=10 is made as 1 by the 11st of bit vector, if until hk (s1)=n-1 is made as 1 by the n position of bit vector.In the time that whether certain data element of inquiry is in S set,, if each in the corresponding bit vector of hash sequence is 1, thinks this data element S, otherwise do not belong to S a hash sequence of data element calculating with a same k hash function.
The memory headroom of applying in above-mentioned steps S122 is the space for Bloom filter.For example obtain a type name by management document, utilize this type name in URL storehouse, to add up the number that records of URL under the type, to calculate the size of Bloom filter according to following formula 1:
T=2 nformula 1
Wherein, the size that T is Bloom filter, n should meet the natural number of following formula 2:
2 n-1< count (url) * B < 2 nformula 2
Wherein, the number that records that count (url) is URL under the type, B is the shared byte number of each URL.
In step S123, the hash function of determined Bloom filter can be various ways, as long as it can meet its false percent of pass and be no more than the highest false percent of pass of setting in the time of whole URL of storage the type.Wherein, false percent of pass is Bloom filter in the time carrying out element inquiry, thinks by mistake and belongs to the probability that in set, this vacation is passed through not belonging to element in set.In the long Bloom filter in m position that has used k hash function, pack into after n element, in bit vector, a certain position is still that 0 probability is: (1-1/m) kn, false percent of pass p is p=[1-(1-1/m) kn] k.Therefore, when setting after the highest acceptable false percent of pass according to user's request, can be in conjunction with the number k and the bit vector length m that record number and determine hash function of URL.For example, when the number that records of URL is 100w, the highest vacation that user sets is by 0.0001, the number k of hash function and bit vector length m can be respectively 8 and 2000w, and now determined hash function can be 8 and one group of mathematical function of the bit vector length random structure that is 2000w for meeting number.
Step S13, according to the type of each URL, is stored in described URL in corresponding described Bloom filter.
Above-mentioned steps can specifically comprise:
Step S131, calculates the hashed value of described URL according to hash function;
Step S132, according to predetermined flag set-up mode, arranges flag by described hashed value corresponding position in applied for memory headroom.
Wherein, this flag set-up mode can be any identification means of user's setting.The flag set-up mode for example setting is setbit (a, i): ((a) [(i)/NBBY] |=1 < < ((i) %NBBY)), the first address of the memory headroom that wherein array a applies for for Bloom filter, a[(i)/NBBY] be the value of [(i)/NBBY] individual position of Bloom filter; I is the hashed value of calculated URL, and NBBY is the bit that byte is shared, i.e. a NBBY=8; ((a) [(i)/NBBY] |=1 < < ((i) %NBBY)) represents ((i) %NBBY) in [(i)/NBBY] individual position of Bloom filter) individual bit is set to 1.
Whole URL of same type are all carried out to above-mentioned steps S131 and step S132, to be stored in the Bloom filter of the type.And, carry out respectively above-mentioned steps S12 and step S13 for the URL of all types.So far, realized the form storage with classification Bloom filter by the URL in URL storehouse.
According to the URL storage means of above-described embodiment, by creating the Bloom filter corresponding to all kinds of URL, and URL is stored in corresponding Bloom filter by type, so when filtering gateway equipment obtains after the Bloom filter that stores URL, when the URL of user's access is audited, if when inquiring about this URL and whether being positioned at the Bloom filter of certain type, for example inquire about this URL and whether be arranged in the Bloom filter for storing URL that can Lawful access, only need calculate according to the hash function of the type Bloom filter the hashed value of this URL, if the position of the corresponding bit vector of this hashed value is 1, can know that this URL is arranged in this Bloom filter, this URL is Lawful access network address.Due to without all travel through the huge URL storehouse of amount of memory for each URL to be audited, thus search efficiency greatly improved, so even if there is a large number of users to initiate access to netwoks simultaneously, also can guarantee the performance of network.And, store URL by adopting Bloom filter, a large amount of storage spaces are also saved, the URL storehouse (supposing that each URL takies 20 bytes) of for example recording for 480w URL of storage, in the case of all Bloom filters are all fully used, storing the required space of these URL is only 91.5M (480w*20B/1024/1024).In addition, while adopting the URL storage means storage URL of above-described embodiment, even derived URL storehouse by other people, also cannot be reduced to each URL entry, thereby can realized highly confidential to this URL storehouse without carry out extra encryption in the situation that.
Further, in the URL of above-described embodiment storage means, also comprise:
Step S14, when increasing URL in described Bloom filter, generate delta package, described delta package comprises the type of the hashed value of increased URL and the URL of described increase, described delta package is used for being sent to filtering gateway equipment, to carry out incremental update by filtering gateway equipment according to described delta package.
Particularly, in the time need increasing URL in Bloom filter, can make the false percent of pass of this Bloom filter improve.So need increases after these URL according to the URL inspection of quantity that will increase in former Bloom filter, the false percent of pass that whether still can meet this Bloom filter is no more than the highest false percent of pass of setting.If can meet, only generate delta package, more specifically, calculate the hashed value of the URL that will increase according to the hash function of this Bloom filter, by the hashed value of this URL together with the type corresponding stored of this URL in delta package.If can not meet, regenerate the Bloom filter of the type, this newly-generated Bloom filter comprises whole URL and this URL newly increasing in former Bloom filter.The relation of the shared memory headroom T of the shared memory headroom T ' of this newly-generated Bloom filter and former Bloom filter should meet T '=T*2.
Table 1 is exemplified with the storage format in delta package.As shown in table 1, while generating delta package in the manner described above, URL of every increase, needs the storage space of 4*N+1 byte.Wherein, N represents hash function number, the byte number (the each field of Hash1-Hash8 is 4 bytes) that each hashed value that 4 expression storages calculate will take, 1 represents that classification id corresponding to URL takies 1 byte (Class_id is a byte).In addition,, in the time increasing multiple URL, can also merge processing to generic URL, to save storage space.For example, when inserting new hash value in database, first inquire about and under this classification, whether have identical value, if had, this hash value is made as sky.Hash value only takies a byte when empty, so when adopting while storing in this way, if there is the URL of identical category to occur that hash value is identical, can save the storage space of 3 bytes for each identical hash value.And, if the corresponding part hash value of the URL of multiple increases is overlapping, can this overlapping hash value of not duplicate record, needn't store corresponding to the mode of 8 hash values according to URL record is fixing, to save storage space.
Table 1
Field Hash1 Hash2 Hash3 Hash4 Hash5 Hash6 Hash7 Hash8 Class_id
Hashed value a b c d e f g h l
In the time that the delta package of generation is sent to filtering gateway equipment by URL server, filtering gateway is according to the hashed value recording in delta package and type, according to the identical mode of URL server execution step S13, correspondence position in the Bloom filter of respective type arranges flag, the URL that delta package can be recorded adds in Bloom filter, thereby has realized the incremental update of filtering gateway equipment.
According to the URL storage means of above-described embodiment, can make in the time carrying out long-range incremental update, URL server only needs to send to filtering gateway equipment the delta package that comprises a small amount of information, and does not need to resend whole Bloom filter, can realize easily remote update.
Further, in the URL of above-described embodiment storage means, also comprise:
Step S15, delete URL from described Bloom filter time, generate the new Bloom filter of deleting after URL and delete bag, described deletion bag comprises that carrying out described Bloom filter and the described new Bloom filter deleted before upgrading occurs the type of the position of different value and the URL of deletion in same position, described deletion bag is used for being sent to filtering gateway equipment, upgrades to be carried out to delete according to described deletion bag by filtering gateway equipment.
Particularly, URL server for example adopts the mode identical with step S12 in the URL storage means of above-described embodiment to generate new Bloom filter, and the employing mode identical with step S13 is stored in remaining URL after deleting in generated new Bloom filter.More former Bloom filter and new Bloom filter, if the ident value difference of same position in two filtrators is recorded the ident value of this position in this position and new Bloom filter in another file.Behind completeer all positions, preserve this file, so just generated and deleted bag.
Table 2 is exemplified with the storage format of deleting in bag.As shown in table 2, generate and delete when bag in the manner described above, URL of every deletion, needs the storage space of N* (4+1) byte, and wherein, N represents hash function number, and 4 represent the byte number that each hashed value that storage calculates will take.1 represents classification id corresponding to URL, takies 1 byte.And, delete in bag and also can adopt the storage format that is similar to table 1, so that needed byte space of URL of every deletion is 4*N+1.In addition, delete in bag and can also adopt the illustrative storage format of table 3, in deletion bag, not only record the type that the position of different value and the URL of described deletion appear in same position, also be documented in the value of this position in newly-generated Bloom filter, needed byte space of URL of now every deletion is N* (4+1+1).Needed storage space of URL of above-mentioned illustrative every deletion is theoretical value, and in actual applications, because certain revised position may be shared by many URL, this will make actual required space be less than above-mentioned theory value.
Table 2
Sequence number Position (4bytes) URL classification id (1byte)
Amendment 1 a 1
Amendment 2 b 1
Amendment 3 c 1
Amendment 4 d 1
Amendment 5 e 1
Amendment 6 f 1
Amendment 7 g 1
Amendment 8 h 1
Table 3
Sequence number Position (4bytes) The value (1byte) of this position URL classification id (1byte)
Amendment 1 a 0 1
In the time carrying out remote update, this deletion bag is sent to filtering gateway equipment by URL server.Filtering gateway equipment is revised local Bloom filter according to the position of recording in this deletion bag.Particularly, for example, for the position a deleting in bag, can know corresponding byte by (a/8+1), and can know corresponding bit position by (a%8), change the value of the corresponding bit position of corresponding byte into 0 by 1, thereby realized the deletion renewal of filtering gateway equipment.
According to the URL storage means of above-described embodiment, can make in the time carrying out long-range deletion renewal, URL server only needs to send to filtering gateway equipment the deletion bag that comprises a small amount of information, and does not need to resend whole Bloom filter, can realize easily remote update.
Fig. 3 is the structural representation of URL memory device of the present invention.As shown in Figure 3, this URL memory device comprises:
Sort module, for classifying to URL according to predtermined category rule;
Generation module, for generating respectively the Bloom filter for storing all types of URL;
Memory module, for according to the type of each URL, is stored in described URL in corresponding described Bloom filter.
It is identical with the URL storage means of above-described embodiment that above-mentioned URL memory device carries out the flow process of URL storage, so locate to repeat no more.
According to the URL memory device of above-described embodiment, by creating the Bloom filter corresponding to all kinds of URL, and URL is stored in corresponding Bloom filter by type, so when filtering gateway equipment obtains after the Bloom filter that stores URL, when the URL of user's access is audited, if when inquiring about this URL and whether being positioned at the Bloom filter of certain type, for example inquire about this URL and whether be arranged in the Bloom filter for storing URL that can Lawful access, only need calculate according to the hash function of the type Bloom filter the hashed value of this URL, if the position of the corresponding bit vector of this hashed value is 1, can know that this URL is arranged in this Bloom filter, this URL is Lawful access network address.Due to without all travel through the huge URL storehouse of amount of memory for each URL to be audited, thus search efficiency greatly improved, so even if there is a large number of users to initiate access to netwoks simultaneously, also can guarantee the performance of network.And, by adopting Bloom filter to store URL, also save a large amount of storage spaces.In addition, while adopting the URL memory device, stores URL of above-described embodiment, even derived this Bloom filter by other people, also cannot be reduced to each URL entry, thereby can realized highly confidential to URL without carry out extra encryption in the situation that.
Further, in the URL of above-described embodiment memory device, also comprise:
Incremental update module, for generating delta package, described delta package comprises the hashed value of increased URL and the type of described URL, described delta package is used for being sent to filtering gateway equipment, to be carried out and to upgrade according to described delta package by filtering gateway equipment.
According to the URL memory device of above-described embodiment, can make, in the time carrying out long-range incremental update, only needs to send to filtering gateway equipment the delta package that comprises a small amount of information, and does not need to resend whole Bloom filter, can realize easily remote update.
Further, in the URL of above-described embodiment memory device, described incremental update module is also for according to the URL quantity that increases described Bloom filter after URL, and inspection increases the false percent of pass of described Bloom filter after described URL and whether is no more than the highest predefined false percent of pass; If so, generate described delta package; If not, regenerate Bloom filter, the shared memory headroom of Bloom filter regenerating is the twice that increases the front shared memory headroom of Bloom filter of URL.
Can avoid causing the false percent of pass of Bloom filter to exceed the highest predefined false percent of pass because increasing URL according to the URL memory device of above-described embodiment.
Further, in the URL of above-described embodiment memory device, also comprise:
Delete update module, be used for generating new Bloom filter and delete bag, described deletion bag comprises that carrying out described Bloom filter and the described new Bloom filter deleted before upgrading occurs the type of the position of different value and the URL of deletion in same position, described deletion bag is used for being sent to filtering gateway equipment, to be carried out and to upgrade according to described deletion bag by filtering gateway equipment.
According to the URL memory device of above-described embodiment, can make, in the time carrying out long-range deletion renewal, only needs to send to filtering gateway equipment the deletion bag that comprises a small amount of information, and does not need to resend whole Bloom filter, can realize easily remote update.
Fig. 4 is the schematic flow sheet of Webpage filtering method of the present invention.As shown in Figure 4, this Webpage filtering method comprises:
Step S21, filtering gateway equipment obtains to classify from URL memory device and stores the Bloom filter of URL;
Step S22, described filtering gateway equipment filters webpage according to the type of the URL storing in described Bloom filter and described URL.
Wherein, the URL memory device that this URL memory device is above-mentioned arbitrary embodiment.Particularly, filtering gateway equipment obtains after the Bloom filter that stores URL, in the time having the URL that needs audit, while needing this URL of inquiry whether to be positioned at the Bloom filter of particular type, for example inquire about this URL and whether be arranged in the Bloom filter for storing URL that can Lawful access, only need calculate according to the hash function of the type Bloom filter the hashed value of this URL, if the position of the corresponding bit vector of this hashed value is 1, can know that this URL is arranged in this Bloom filter, this URL is Lawful access network address, now allows this access to netwoks; Otherwise, stop this access to netwoks.
According to the Webpage filtering method of above-described embodiment, due in the time carrying out URL audit, without all travel through the huge URL storehouse of amount of memory for each URL to be audited, so greatly improved search efficiency, even so there is a large number of users to initiate access to netwoks simultaneously, also can guarantee the performance of network.
Further, in the Webpage filtering method of above-described embodiment, also comprise:
Step S23, described filtering gateway equipment obtains delta package from described URL memory device, and carries out incremental update according to described delta package, and described delta package comprises the hashed value of URL of increase and the type of the URL of described increase.
Particularly, filtering gateway is according to the hashed value recording in delta package and type, according to the identical mode of step S13 in the URL storage means of above-described embodiment, correspondence position in the Bloom filter of respective type arranges flag, the URL that delta package can be recorded adds in Bloom filter, thereby has realized the incremental update of filtering gateway equipment.
According to the Webpage filtering method of above-described embodiment, can make in the time carrying out long-range incremental update, filtering gateway equipment only need obtain the delta package that comprises a small amount of information from URL memory device, and does not need to obtain whole Bloom filter, can realize easily remote update.
Further, in the Webpage filtering method of above-described embodiment, also comprise:
Step S24, described filtering gateway equipment obtains and deletes bag from described URL memory device, delete and upgrade according to described deletion bag execution, described deletion bag comprises that the type of the position of different value and the URL of described deletion appears in the new Bloom filter of URL memory device after the described Bloom filter and the deletion URL that delete before URL in same position.
Particularly, filtering gateway equipment is according to the position of record in deletion bag and revise local Bloom filter corresponding to the ident value of this position, thereby has realized the deletion renewal of filtering gateway equipment.
According to the Webpage filtering method of above-described embodiment, can make in the time carrying out long-range deletion renewal, filtering gateway equipment only need obtain the deletion bag that comprises a small amount of information from URL memory device, and does not need to obtain whole Bloom filter, can realize easily remote update.
Fig. 5 is the structural representation of filtering gateway equipment of the present invention.As shown in Figure 5, this filtering gateway equipment comprises:
Acquisition module, stores the Bloom filter of URL for obtaining from URL memory device to classify;
Filtering module, filters webpage for the URL that stores according to described Bloom filter and the type of described URL.
It is identical with the Webpage filtering method of above-described embodiment that the filtering gateway equipment of above-described embodiment is carried out the flow process of home page filter, so locate to repeat no more.
According to the filtering gateway equipment of above-described embodiment, due in the time carrying out URL audit, while needing this URL of inquiry whether to be positioned at the Bloom filter of particular type, only need calculate according to the hash function of the type Bloom filter the hashed value of this URL, if the position of the corresponding bit vector of this hashed value is 1, can know that this URL is arranged in this Bloom filter, without all travel through the huge URL storehouse of amount of memory for each URL to be audited, so greatly improved search efficiency, even initiate access to netwoks so there is a large number of users simultaneously, also can guarantee the performance of network.
Further, in the filtering gateway equipment of above-described embodiment, also comprise:
Incremental update module, for obtaining delta package from described URL memory device, and carries out incremental update according to described delta package, and described delta package comprises the hashed value of URL of increase and the type of the URL of described increase.
According to the filtering gateway equipment of above-described embodiment, in the time carrying out long-range incremental update, only need obtain the delta package that comprises a small amount of information from URL memory device, and not need to obtain whole Bloom filter, can realize easily remote update.
Further, in the filtering gateway equipment of above-described embodiment, also comprise:
Delete update module, obtain and delete bag from described URL memory device, delete and upgrade according to described deletion bag execution, described deletion bag comprises that the type of the position of different value and the URL of described deletion appears in the new Bloom filter of URL memory device after the described Bloom filter and the deletion URL that delete before URL in same position.
According to the filtering gateway equipment of above-described embodiment, in the time carrying out long-range deletion renewal, only need obtain the deletion bag that comprises a small amount of information from URL memory device, and not need to obtain whole Bloom filter, can realize easily remote update.
The present invention also provides a kind of webpage filter system, comprises the URL memory device of above-mentioned arbitrary embodiment and the filtering gateway equipment of arbitrary embodiment.
According to the webpage filter system of above-described embodiment, even in the time having a large number of users to initiate access to netwoks simultaneously, also can guarantee the performance of network.
Finally it should be noted that: above embodiment only, in order to technical scheme of the present invention to be described, is not intended to limit; Although the present invention is had been described in detail with reference to previous embodiment, those of ordinary skill in the art is to be understood that: its technical scheme that still can record aforementioned each embodiment is modified, or part technical characterictic is wherein equal to replacement; And these amendments or replacement do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (11)

1. a uniform resource position mark URL storage means, is characterized in that, comprising:
Step S11, classifies to URL according to predtermined category rule;
Step S12, generates respectively the Bloom filter for storing all types of URL;
Step S13, according to the type of each URL, is stored in described URL in corresponding described Bloom filter;
Step S15, delete URL from described Bloom filter time, generate the new Bloom filter of deleting after URL and delete bag, described deletion bag comprises that carrying out described Bloom filter and the described new Bloom filter deleted before upgrading occurs the type of the position of different value and the URL of deletion in same position, described deletion bag is used for being sent to home page filter equipment, upgrades to be carried out to delete according to described deletion bag by described home page filter equipment.
2. URL storage means according to claim 1, is characterized in that, also comprises:
Step S14, when increasing URL in described Bloom filter, generate delta package, described delta package comprises the hashed value of increased URL and the type of the URL increasing, described delta package is used for being sent to described home page filter equipment, to carry out incremental update by described home page filter equipment according to described delta package.
3. URL storage means according to claim 2, is characterized in that, before described step S14, also comprises:
According to the URL quantity in described Bloom filter after increase URL, after the described URL of inspection increase, whether the false percent of pass of described Bloom filter is no more than the highest predefined false percent of pass; If so, carry out described step S14; If not, regenerate Bloom filter, the shared memory headroom of Bloom filter regenerating is the twice that increases the front shared memory headroom of Bloom filter of URL.
4. a URL memory device, is characterized in that, comprising:
Sort module, for classifying to URL according to predtermined category rule;
Generation module, for generating respectively the Bloom filter for storing all types of URL;
Memory module, for according to the type of each URL, is stored in described URL in corresponding described Bloom filter;
Delete update module, be used for generating new Bloom filter and delete bag, described deletion bag comprises that carrying out described Bloom filter and the described new Bloom filter deleted before upgrading occurs the type of the position of different value and the URL of deletion in same position, described deletion bag is used for being sent to home page filter equipment, to be carried out and to upgrade according to described deletion bag by described home page filter equipment.
5. URL memory device according to claim 4, is characterized in that, also comprises:
Incremental update module, be used for generating delta package, described delta package comprises the hashed value of increased URL and the type of the URL increasing, and described delta package is used for being sent to described home page filter equipment, to be carried out and to upgrade according to described delta package by described home page filter equipment.
6. URL memory device according to claim 5, it is characterized in that, described incremental update module is also for according to the URL quantity that increases described Bloom filter after URL, and inspection increases the false percent of pass of described Bloom filter after described URL and whether is no more than the highest predefined false percent of pass; If so, generate described delta package; If not, regenerate Bloom filter, the shared memory headroom of Bloom filter regenerating is the twice that increases the front shared memory headroom of Bloom filter of URL.
7. a Webpage filtering method, is characterized in that, comprising:
Step S21, home page filter equipment obtains to classify from the URL memory device as described in as arbitrary in claim 4 to 6 and stores the Bloom filter of URL;
Step S22, described home page filter equipment filters webpage according to the type of the URL storing in described Bloom filter and described URL;
Step S24, described home page filter equipment obtains and deletes bag from described URL memory device, delete and upgrade according to described deletion bag execution, described deletion bag comprises that the type of the position of different value and the URL of described deletion appears in the new Bloom filter of URL memory device after the described Bloom filter and the deletion URL that delete before URL in same position.
8. Webpage filtering method according to claim 7, is characterized in that, also comprises:
Step S23, described home page filter equipment obtains delta package from described URL memory device, and carries out incremental update according to described delta package, and described delta package comprises the hashed value of URL of increase and the type of the URL of described increase.
9. a home page filter equipment, is characterized in that, comprising:
Acquisition module, stores the Bloom filter of URL for obtaining classification from the URL memory device as described in as arbitrary in claim 4 to 6;
Filtering module, filters webpage for the URL that stores according to described Bloom filter and the type of described URL;
Delete update module, obtain and delete bag from described URL memory device, delete and upgrade according to described deletion bag execution, described deletion bag comprises that the type of the position of different value and the URL of described deletion appears in the new Bloom filter of URL memory device after the described Bloom filter and the deletion URL that delete before URL in same position.
10. home page filter equipment according to claim 9, is characterized in that, also comprises:
Incremental update module, for obtaining delta package from described URL memory device, and carries out incremental update according to described delta package, and described delta package comprises the hashed value of URL of increase and the type of the URL of described increase.
11. 1 kinds of webpage filter systems, is characterized in that, comprise the URL memory device as described in as arbitrary in claim 4 to 6, and home page filter equipment as described in claim 9 or 10.
CN201110187962.9A 2011-05-25 2011-07-06 Uniform resource locator (URL) storage method, web filtering method, device and system Active CN102253991B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110187962.9A CN102253991B (en) 2011-05-25 2011-07-06 Uniform resource locator (URL) storage method, web filtering method, device and system

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201110137020.X 2011-05-25
CN201110137020 2011-05-25
CN201110187962.9A CN102253991B (en) 2011-05-25 2011-07-06 Uniform resource locator (URL) storage method, web filtering method, device and system

Publications (2)

Publication Number Publication Date
CN102253991A CN102253991A (en) 2011-11-23
CN102253991B true CN102253991B (en) 2014-07-30

Family

ID=44981255

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110187962.9A Active CN102253991B (en) 2011-05-25 2011-07-06 Uniform resource locator (URL) storage method, web filtering method, device and system

Country Status (1)

Country Link
CN (1) CN102253991B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9582565B2 (en) 2014-06-04 2017-02-28 International Business Machines Corporation Classifying uniform resource locators

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103383665B (en) * 2013-07-12 2016-04-27 北京奇虎科技有限公司 Be suitable in url data crawl the method for data buffer storage and device
CN103544316B (en) * 2013-11-06 2017-02-08 苏州大拿信息技术有限公司 Uniform resource locator (URL) filtering system and achieving method thereof
CN105119916B (en) * 2015-08-21 2018-04-10 福建天晴数码有限公司 A kind of authentication method and system based on http
CN105320740B (en) * 2015-09-22 2018-10-16 清华大学 The acquisition methods and acquisition system of wechat article and public platform
CN105653627A (en) * 2015-12-28 2016-06-08 湖南蚁坊软件有限公司 Bloom filter-based data classification method
CN106970984B (en) * 2017-03-29 2020-11-06 杭州迪普科技股份有限公司 URL filter library updating method and device
CN107888659A (en) * 2017-10-12 2018-04-06 北京京东尚科信息技术有限公司 The processing method and system of user's request
CN109977261B (en) * 2019-04-02 2021-11-26 北京奇艺世纪科技有限公司 Data request processing method and device and server
CN112948370B (en) * 2019-11-26 2023-04-11 上海哔哩哔哩科技有限公司 Data classification method and device and computer equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101261644A (en) * 2008-04-30 2008-09-10 杭州华三通信技术有限公司 Method and device for accessing united resource positioning symbol database
JP2010123000A (en) * 2008-11-20 2010-06-03 Nippon Telegr & Teleph Corp <Ntt> Web page group extraction method, device and program
CN101901248A (en) * 2010-04-07 2010-12-01 北京星网锐捷网络技术有限公司 Method and device for creating and updating Bloom filter and searching elements
CN101923568A (en) * 2010-06-23 2010-12-22 北京星网锐捷网络技术有限公司 Method for increasing and canceling elements of Bloom filter and Bloom filter

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11167580A (en) * 1997-12-04 1999-06-22 Nec Corp Automatic sorting device and method for url of web client

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101261644A (en) * 2008-04-30 2008-09-10 杭州华三通信技术有限公司 Method and device for accessing united resource positioning symbol database
JP2010123000A (en) * 2008-11-20 2010-06-03 Nippon Telegr & Teleph Corp <Ntt> Web page group extraction method, device and program
CN101901248A (en) * 2010-04-07 2010-12-01 北京星网锐捷网络技术有限公司 Method and device for creating and updating Bloom filter and searching elements
CN101923568A (en) * 2010-06-23 2010-12-22 北京星网锐捷网络技术有限公司 Method for increasing and canceling elements of Bloom filter and Bloom filter

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9582565B2 (en) 2014-06-04 2017-02-28 International Business Machines Corporation Classifying uniform resource locators

Also Published As

Publication number Publication date
CN102253991A (en) 2011-11-23

Similar Documents

Publication Publication Date Title
CN102253991B (en) Uniform resource locator (URL) storage method, web filtering method, device and system
US11449562B2 (en) Enterprise data processing
CN101916261B (en) Data partitioning method for distributed parallel database system
CN111459985B (en) Identification information processing method and device
CN104794123A (en) Method and device for establishing NoSQL database index for semi-structured data
CN103793479A (en) Log management method and log management system
CN104679778A (en) Search result generating method and device
CN105069111A (en) Similarity based data-block-grade data duplication removal method for cloud storage
US11250166B2 (en) Fingerprint-based configuration typing and classification
CN105447113A (en) Big data based informatiion analysis method
US11803461B2 (en) Validation of log files using blockchain system
CN103984753A (en) Method and device for extracting web crawler reduplication-removing characteristic value
CN104239353B (en) WEB classification control and log audit method
CN102609462A (en) Method for compressed storage of massive SQL (structured query language) by means of extracting SQL models
CN107391557B (en) Block chain serial query method and system for setting out-of-chain fault table
CN103823807A (en) Data de-duplication method, device and system
CN110011830A (en) Communication topology information modeling method based on data on flows
CN106649602A (en) Way, device and server of processing business object data
JP2008102795A (en) File management device, system, and program
CN107463596B (en) Block chain parallel query method and system for setting out-of-chain fault table
CN111008183B (en) Storage method and system for business wind control log data
CN107451177B (en) Query method and system for single error-surveying block chain of increased blocks
CN106339415A (en) Data checking method, device and system
WO2017000592A1 (en) Data processing method, apparatus and system
KR102253841B1 (en) Apparatus for Processing Transaction with Modification of Data in Large-Scale Distributed File System and Computer-Readable Recording Medium with Program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant