WO2013097494A1 - Method and device for filtering uniform resource locator (url) - Google Patents

Method and device for filtering uniform resource locator (url) Download PDF

Info

Publication number
WO2013097494A1
WO2013097494A1 PCT/CN2012/081548 CN2012081548W WO2013097494A1 WO 2013097494 A1 WO2013097494 A1 WO 2013097494A1 CN 2012081548 W CN2012081548 W CN 2012081548W WO 2013097494 A1 WO2013097494 A1 WO 2013097494A1
Authority
WO
WIPO (PCT)
Prior art keywords
url
category
webpage
connection request
content
Prior art date
Application number
PCT/CN2012/081548
Other languages
French (fr)
Chinese (zh)
Inventor
蒋武
薛智慧
李世光
万时光
Original Assignee
华为数字技术(成都)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为数字技术(成都)有限公司 filed Critical 华为数字技术(成都)有限公司
Publication of WO2013097494A1 publication Critical patent/WO2013097494A1/en
Priority to US14/307,014 priority Critical patent/US9331981B2/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0245Filtering by information in the payload
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0236Filtering by address, protocol, port number or service, e.g. IP-address or URL
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Definitions

  • the present invention relates to the field of data communications, and more particularly to a filtering method and apparatus for a uniform resource locator URL. Background technique
  • the URL (Uniform Resource Locator) filtering technology has been widely applied in the field of communication, and has become a mature technology in the field of application content security.
  • the technology can filter out webpages of a set category according to the needs of the user, depending on the category of the webpage, for example, filtering out newspages.
  • a security device with URL filtering When a security device with URL filtering detects a connection request, it obtains the category to which the URL belongs by remotely querying the classification server according to the URL requested by the connection request, and then caches it in the local storage.
  • the filtering operation may be directly performed according to the category to which the cached URL belongs; After the time, the remote query needs to be performed again to obtain the category to which the URL belongs.
  • webpages contain content that is dynamic, that is, the webpages that request access are different in different time periods, and the specific content and the category to which they belong may be different.
  • the content of the webpage is found to be changed in time, or the classification server has been updated, but the category of the URL cached in the security device belongs to the aging time, so that the URL belongs to The categories cannot be updated in a timely manner. In this case, some should be filtered out.
  • the embodiments of the present invention provide a method and a device for filtering a uniform resource locator URL, so as to overcome the problem that the classification server may not be able to update the URL accurately due to the fact that the classification server may not be updated in time.
  • the present invention provides the following technical solutions:
  • a method for filtering a uniform resource locator URL including:
  • the URL pass policy includes a webpage category that is allowed to pass
  • a filtering device for a uniform resource locator URL comprising:
  • a request receiving module configured to receive a URL connection request initiated by the client
  • a first category obtaining module configured to: find, from a webpage category corresponding to each URL in the pre-stored category information table, a first category corresponding to the URL carried in the URL connection request;
  • a pass-through judging module configured to determine whether the first category meets a preset URL pass-through policy, where the URL pass-through policy includes a webpage category that is allowed to pass;
  • a request sending module configured to: when the judgment result of the pass judgment module is YES, the URL The connection request is sent to the server corresponding thereto, and receives the content of the webpage returned by the server; the category determining module is configured to determine, according to the content of the webpage, a second category corresponding to the URL, and determine whether the second category meets the Pre-set URL traffic policy;
  • a content returning module configured to send the webpage content to the client when the judgment result of the category judging module is yes;
  • a blocking module configured to block the content of the webpage when the determination result of the category determining module is negative.
  • the embodiment of the present invention discloses a URL filtering method and device, which can find a first category corresponding to a URL carried in a URL connection request from a pre-stored category information table, where the URL connection request conforms to a preset
  • the URL connection request is forwarded to the corresponding server, and the second category corresponding to the URL is determined according to the webpage content returned by the server, and it is determined whether the second category conforms to the preset URL pass policy. And if the second category meets the preset URL pass policy, sending the webpage content to the client; otherwise, blocking the webpage content.
  • the URL filtering method and device can determine the category to which the URL belongs in real time, and ensure that the URL connection request that is released, but should actually be blocked, is blocked in time when the content of the webpage changes or the category is not updated in time. Achieve accurate classification filtering. DRAWINGS
  • FIG. 1 is a flowchart of a first URL filtering method according to an embodiment of the present invention
  • FIG. 2 is a flowchart of determining a second category of webpage content according to an embodiment of the present invention
  • 3a is a flowchart of a second URL filtering method according to an embodiment of the present invention.
  • FIG. 3b is a flowchart of a third filtering method of a URL according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a URL filtering apparatus according to an embodiment of the present invention
  • FIG. 5 is a schematic structural diagram of a first category obtaining module according to an embodiment of the present invention
  • FIG. 6 is a schematic structural diagram of a category determining module according to an embodiment of the present invention
  • FIG. 7 is a schematic structural diagram of a second URL filtering apparatus according to an embodiment of the present invention
  • FIG. 8 is a schematic structural diagram of a third URL filtering apparatus according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of a first method for filtering a URL according to an embodiment of the present invention.
  • the URL filtering method may include:
  • Step 101 Receive a URL connection request initiated by a client.
  • Step 102 Search, from a webpage category corresponding to each URL in the pre-stored category information table, a first category corresponding to the URL carried in the URL connection request;
  • the pre-stored category information table may be locally cached or pre-stored on the remote classification server.
  • the user may also store the category information corresponding to all existing URLs locally, and periodically update them. , to a certain extent, to meet the needs of URL connection request filtering work;
  • the pre-stored category information table can be in various forms, for example, the form of the record table can be used, or the form of the file can be used.
  • the storage form is not limited to the above two types, as long as the URL can be indicated.
  • the form of the correspondence of the web page categories can be used;
  • the category information table in the locally cached category information table does not have the category information corresponding to the URL carried in the URL connection request, or the previously cached category information has passed the aging time, then Obtaining the pre-stored category information table cache from the remote classification server to the local, and then searching for the first category corresponding to the URL carried in the URL connection request according to the correspondence between the URL and the webpage category in the category information table;
  • Step 103 Determine whether the first category meets a preset URL pass policy, where the URL pass policy includes a webpage category that is allowed to pass, and if yes, proceed to step 104; the URL that is out does not conform to the URL pass policy; When the first category meets the preset URL pass policy, the process proceeds to step 104 to perform a corresponding step, and when the first category does not meet the preset URL pass policy, the URL connection request is blocked;
  • Step 104 Send the URL connection request to a server corresponding thereto, and receive the content of the webpage returned by the server;
  • the related device sends the URL connection request to the corresponding URL, and when the URL receives the connection request, the URL will The content of the webpage requested by the connection request is returned to the relevant step 105: determining the second category corresponding to the URL according to the content of the webpage; optionally, the specific method for determining the second category of the URL may be referred to FIG. 2, FIG.
  • the method for determining the second category of the webpage content disclosed in the embodiment of the present invention, as shown in the figure, the step of determining the second category of the webpage content may include:
  • Step 201 Decoding the webpage content, and extracting the identification keyword of the webpage content; the identification keywords extracted here are, for example, a star, a microblog, a short message, etc., of course, some sensitive character segments, such as SARS. Type pneumonia, US president, etc.;
  • Step 202 Determine, according to the correspondence between the identification keyword and the webpage category stored in the local vocabulary list, the second category corresponding to the URL is a webpage category corresponding to the extracted identification keyword;
  • Some keywords or sensitive characters can be classified as follows:
  • the identification keywords corresponding to the entertainment category include: background, qq space, blessing SMS, funny SMS, etc.;
  • the identification keywords corresponding to the news category include: military, financial, reporting, newspaper, etc. ; sports category pair Key words that should be identified include: streetball, basketball, football, nautical, aerobics, etc.
  • the data content extracted in step 301 includes the keyword "soccer”
  • the webpage category of the URL corresponding to the data content may be identified as a sports class, and the executive body of the URL filtering method further specifies a sports class.
  • the URL is not allowed to be accessed, and the content of the webpage including the data content is blocked;
  • the method for determining the second category of the URL according to the content of the webpage is not limited to the above process, for example, extracting the semantic relationship from the webpage content, matching the semantic relationship template in the pre-stored semantic library, and matching the semantic relationship template in the matching
  • the corresponding webpage category is used as the second category, and the purpose of determining the second category according to the content of the webpage can also be achieved. No longer here - enumerating the specific way of determining the second category according to the content of the webpage, as long as it is a method capable of determining the second category of the URL according to the content of the webpage, it should belong to the scope of protection of the present invention;
  • Step 106 Determine whether the second category meets the preset URL pass policy, and if yes, go to step 107, if no, go to step 108;
  • Step 107 Send the webpage content to the client
  • Step 108 Block the webpage Content
  • the second category determined in step 106 does not belong to the webpage category allowed to pass in the pass policy set by the user, the second category is directly prohibited from being used, and the related webpage content is blocked. Broken.
  • the method can search for the first category corresponding to the URL carried in the URL connection request from the pre-stored category information table, and release and forward the URL connection request of the first category according to the preset URL access policy.
  • Go to the corresponding server determine the second category corresponding to the URL according to the content of the webpage returned by the server, and determine whether the second category meets the preset URL pass policy, and if the second category meets the preset URL pass
  • the policy is to send the webpage content to the client; otherwise, the webpage content is blocked.
  • FIG. 3 is a flowchart of a method for filtering a second URL according to an embodiment of the present invention.
  • the URL filtering method may include:
  • Step 301 Receive a URL connection request initiated by a client.
  • Step 302 Search, from a webpage category corresponding to each URL in the pre-stored category information table, a first category corresponding to the URL carried in the URL connection request;
  • Step 303 Determine whether the first category meets the preset URL pass policy, and if yes, go to step 306, if no, go to step 304;
  • Step 304 Block the URL connection request, proceed to step 305;
  • Step 305 From the blocked URL connection request, filtering the URL connection request with the preset identifier, and proceeding to step 306;
  • the content of some web pages is dynamically changed, in order to avoid that the same URL does not meet the URL pass-through policy before a certain time, and the category actually belongs to a URL pass-through policy after a certain time, and The classification of the URL is not updated in time, so that the URL connection request that should be allowed to pass is blocked.
  • the user may use the method described in this step, and the preset identifier may be a specific key.
  • the word can also be a fixed connection address, or a username, etc.
  • Step 306 Send the URL connection request to a server corresponding thereto, and receive the content of the webpage returned by the server;
  • Step 307 Determine, according to the content of the webpage, a second category corresponding to the URL.
  • Step 308 Determine whether the second category meets a preset URL traffic policy. If yes, go to step 309. If no, go to step 310. ;
  • Step 309 Send the content of the webpage to the client, and proceed to step 311;
  • Step 310 Block the content of the webpage;
  • Step 311 Update the webpage category corresponding to the URL carried in the URL connection request in the pre-stored category information table to the second category.
  • the process shown in FIG. 3a may also be adjusted as follows: the order of step 305 and step 304 is interchanged, that is, before blocking a URL connection request, it is determined whether the URL connection request carries a pre- Set the identifier, if the URL connection request is blocked, otherwise the URL connection request is allowed to pass, and the batch processing mode is adjusted to real-time processing.
  • Step 321 receiving the client-initiated URL connection request;
  • Step 322 Find, from a webpage category corresponding to each URL in the pre-stored category information table, a first category corresponding to the URL carried in the URL connection request;
  • Step 323 Determine whether the first category meets the preset URL pass policy, and if yes, go to step 326, if no, go to step 324;
  • Step 324 determining whether the URL connection request carries a preset identifier, if it proceeds to step 326, otherwise, proceeds to step 325;
  • Step 325 blocking the URL connection request.
  • Step 326 Send the URL connection request to a server corresponding thereto, and receive the content of the webpage returned by the server;
  • Step 327 Determine, according to the content of the webpage, a second category corresponding to the URL.
  • Step 329 Send the content of the webpage to the client, and proceed to step 331;
  • Step 330 Block the content of the webpage
  • Step 331 Update the webpage category corresponding to the URL carried in the URL connection request in the pre-stored category information table to the second category.
  • the URL pass policy it can be known that the first category of the relevant URL on the locally cached or remotely acquired classification server is inaccurate, and the first category corresponding to the locally cached URL can be updated to the second determined according to the content of the webpage.
  • Category or send a URL classification change request to the remote classification server, To enable the classification server to make relevant follow-up actions according to the URL classification change request.
  • the URL filtering method is configured to determine, according to the pre-stored URL first category, whether the URL connection request initiated by the client conforms to a preset URL pass policy, and the URL connection request satisfies the preset URL filtering policy.
  • the URL connection request is forwarded to the corresponding server, and the second category is determined according to the content of the webpage returned by the server, and it is determined whether the second category conforms to the preset URL traffic policy. If not, the device will be blocked.
  • the content of the returned web page is broken, and it is possible to avoid some URL connection requests that should be released but are blocked due to untimely update of the classification.
  • FIG. 4 is a schematic structural diagram of a URL filtering apparatus according to an embodiment of the present invention, as shown in FIG. 4.
  • the URL filtering device 40 can include:
  • a request receiving module 401 configured to receive a URL connection request initiated by the client
  • the first category obtaining module 402 is configured to: find, from a webpage category corresponding to each URL in the pre-stored category information table, a first category corresponding to the URL carried in the URL connection request; the first category obtaining module For the specific structure of the interface 402, refer to FIG. 5. As shown in the figure, the connection determining module 402 may specifically include:
  • the information table obtaining submodule 4021 is configured to obtain a pre-stored category information table cache from a remote classification server to the local device;
  • the category information table of the URL corresponding to the URL connection request is locally cached, the category information table can be directly obtained from the local cache;
  • the first category determining sub-module 4022 is configured to search, according to the correspondence between the URL and the webpage category in the category information table, the first category corresponding to the URL carried in the URL connection request; the pass determining module 403 is configured to determine Whether the first category conforms to a preset URL pass policy
  • the URL pass-through policy includes a webpage category that is allowed to pass;
  • the request sending module 404 is configured to send the URL connection request to a server corresponding to the server when the determination result of the pass determining module 403 is YES, and receive the content of the webpage returned by the server;
  • a class determining module 405, configured to determine, according to the content of the webpage, a second category corresponding to the URL, and determine whether the second category meets the preset URL traffic policy;
  • the method may include: a webpage decoding sub-module 4051, configured to decode the webpage content, and extract the identification keyword of the webpage content;
  • the second category determining sub-module 4052 is configured to determine, according to the correspondence between the identification keyword and the webpage category stored in the local thesaurus list, the second category corresponding to the URL is the webpage category corresponding to the extracted identification keyword;
  • the class judging sub-module 4053 is configured to determine whether the second category identified by the second category determining sub-module 4052 conforms to the URL pass-through policy;
  • the content returning module 406 is configured to send the content of the webpage to the client when the determination result of the category determining module is YES;
  • the blocking module 407 is configured to block the content of the webpage when the judgment result of the category determining module is negative.
  • the URL filtering apparatus of the embodiment of the present invention is not limited to the foregoing one structure.
  • the first category obtaining module 402 and the pass determining module 403 may be an independent module integrated into one, and complete the URL connection request.
  • the first category obtains and determines whether the first category meets a preset traffic policy; for example, the content return module 406 and the blocking module 407 may be one module.
  • the URL filtering apparatus may further include a category updating module, configured to: when the category determining module 405 determines that the identifying category does not conform to the preset URL pass policy, the locally cached category information.
  • the webpage category corresponding to the URL carried in the URL connection request in the table is updated to the second category.
  • the blocking module 407 is further configured to: when the traffic judging module 403 determines that the first category does not comply with a preset URL pass policy, Breaking the URL connection request;
  • the URL filtering apparatus may further include an identifier filtering module 701, configured to filter, by the blocked URL connection request, a URL connection request with a preset identifier, and trigger the request sending module 404 to send the URL connection request to And corresponding to the server, and receiving the webpage content returned by the server; the category determining module 405 determines a second category corresponding to the URL according to the webpage content, and determines whether the second category meets the preset The URL passing policy; if the identifying category meets the preset URL pass policy, the content returning module 406 sends the webpage content to the client; otherwise, the blocking module 407 blocks the webpage content.
  • an identifier filtering module 701 configured to filter, by the blocked URL connection request, a URL connection request with a preset identifier, and trigger the request sending module 404 to send the URL connection request to And corresponding to the server, and receiving the webpage content returned by the server; the category determining module 405 determines a second category corresponding to the URL according to the webpage content, and determines whether the second category
  • the URL filtering apparatus may further include an identifier determining module 702, configured to determine whether the URL connection request has a preset when the determining result of the pass determining module 403 is negative. Identifying; if the preset identifier is provided, triggering the request sending module 404 to send the URL connection request to the server corresponding thereto, and receiving the webpage content returned by the server; the category determining module 405 according to the webpage Content, determining a second category corresponding to the URL, determining whether the second category meets the preset URL pass policy; if the identification category meets the preset URL pass policy, the content return module 406
  • the webpage content is sent to the client; otherwise, the trigger blocking module 407 blocks the webpage content;
  • the trigger blocking module 407 blocks the URL connection request.
  • the device can search for the first category corresponding to the URL connection request from the pre-stored category information table, and release the URL connection request of the first category according to the preset URL access policy, and forward the request to the corresponding server. And determining, according to the content of the webpage returned by the server, the second category corresponding to the URL, and determining whether the second category meets the preset URL traffic policy, and if the second category meets the preset URL traffic policy, The webpage content is sent to the client; otherwise, the webpage content is blocked.
  • the classification of the URL can be determined in real time, and if the classification update is not timely, the URL connection request that is released, but should actually be blocked, can be blocked in time to achieve accurate classification and filtering.
  • the embodiment of the present invention further discloses a gateway.
  • the gateway 90 includes the URL filtering apparatus 40 disclosed in the embodiment of the present invention, which can first receive a URL connection request initiated by a client; Determining whether the URL connection request conforms to a preset URL pass policy; if yes: sending the URL connection request to a server corresponding thereto, and receiving webpage content returned by the server; determining the content of the webpage a second category, determining whether the second category meets a preset URL pass policy; if yes: sending the returned webpage content to the client; if not: blocking the returned webpage content, the gateway can Real-time determination of the classification to which the URL belongs, to ensure that if the classification update is not timely, the URL connection request that is released, but should actually be blocked, is blocked in time to achieve accurate classification and filtering.
  • RAM random memory Memory
  • ROM read only memory
  • EEPROM electrically programmable ROM
  • EEPly erasable programmable ROM registers
  • hard disk hard disk
  • removable disk CD-ROM

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method and device for filtering a uniform resource locator (URL). The method allows for: finding in a prestored category information table a first category corresponding to a URL connection request, clearing the first category URL connection request conforming to a predetermined URL access policy, forwarding to a corresponding server, determining a second category corresponding to the URL on the basis of a webpage content returned by the server, determining if the second category conforms to the predetermined URL access policy, and, if the second category conforms to the predetermined URL access policy, transmitting the webpage content to a client terminal; if otherwise, blocking the webpage content. The method and device disclosed for filtering the URL allow for real-time determination of the category to which the URL belongs, ensure that, in a scenario of non-timely category update, a URL connection request that has been cleared but should in fact be blocked is blocked in a timely manner, and implement the feature of accurate category filtration.

Description

统一资源定位符 URL的过滤方法及装置 本申请要求于 2011 年 12 月 31 日提交中国专利局、 申请号为 201110459686.7、 发明名称为 "统一资源定位符 URL的过滤方法、 装置及网 关" 的中国专利申请, 以及于 2012年 2月 1 日提交中国专利局、 申请号为 201210022574.X, 发明名称为 "统一资源定位符 URL的过滤方法及装置" 的 中国专利申请的优先权, 其全部内容通过引用结合在本申请中。 技术领域  Method and device for filtering uniform resource locator URL This application claims to be submitted to the Chinese Patent Office on December 31, 2011, the application number is 201110459686.7, and the Chinese patent entitled "Uniform Resource Locator URL Filtering Method, Device and Gateway" Application, and the priority of the Chinese Patent Application entitled "Uniform Resource Locator URL Filtering Method and Apparatus" filed on February 1, 2012 by the Chinese Patent Office, Application No. 201210022574.X, the entire contents of which are hereby incorporated by reference. Combined in this application. Technical field
本发明涉及数据通信领域, 更具体的说, 是涉及统一资源定位符 URL的 过滤方法及装置。 背景技术  The present invention relates to the field of data communications, and more particularly to a filtering method and apparatus for a uniform resource locator URL. Background technique
URL ( Uniform Resource Locator统一资源定位符)过滤技术现已广泛的 应用到通信领域中, 成为应用内容安全领域中一种成熟的技术。 该技术能够 基于网页所属类别的不同, 根据用户的需要过滤掉设定类别的网页, 例如过 滤掉新闻类的网页。  The URL (Uniform Resource Locator) filtering technology has been widely applied in the field of communication, and has become a mature technology in the field of application content security. The technology can filter out webpages of a set category according to the needs of the user, depending on the category of the webpage, for example, filtering out newspages.
具备 URL过滤功能的安全设备在检测到有连接请求时, 根据此连接请求 要求连接的 URL, 通过远程查询分类服务器来获取此 URL所属的类别, 然后 緩存在本地存储器中。 现有技术中, 如果已获取一个 URL所属的类别, 当用 户在緩存的老化时间内再次访问所述 URL 时, 就可以根据緩存的所述 URL 所属的类别直接执行过滤工作; 而在緩存超过老化时间后, 则需要再次进行 远程查询以获得所述 URL所属的类别。  When a security device with URL filtering detects a connection request, it obtains the category to which the URL belongs by remotely querying the classification server according to the URL requested by the connection request, and then caches it in the local storage. In the prior art, if the category to which a URL belongs is obtained, when the user accesses the URL again during the aging time of the cache, the filtering operation may be directly performed according to the category to which the cached URL belongs; After the time, the remote query needs to be performed again to obtain the category to which the URL belongs.
然而在实际情况中, 很多网页包含的内容是动态的, 也就是说, 请求访 问的网页在不同的时间段内, 其具体内容、 所属的类别可能都是不同的, 而 此时如果分类服务器没有及时发现网页内容的变化, 或分类服务器已经更新, 但安全设备中緩存的 URL所属的类别还处于老化时间内, 就使得 URL所属 的类别不能够得到及时的更新, 这种情况下, 一些原本应该被过滤掉的一些However, in actual situations, many webpages contain content that is dynamic, that is, the webpages that request access are different in different time periods, and the specific content and the category to which they belong may be different. The content of the webpage is found to be changed in time, or the classification server has been updated, but the category of the URL cached in the security device belongs to the aging time, so that the URL belongs to The categories cannot be updated in a timely manner. In this case, some should be filtered out.
URL连接请求很可能就会被放行, 使得 URL分类识别不准确, 进而无法准确 实现过滤功能。 发明内容 The URL connection request is likely to be released, making the URL classification identification inaccurate, and thus the filtering function cannot be accurately implemented. Summary of the invention
有鉴于此, 本发明实施例提供了一种统一资源定位符 URL的过滤方法及 装置, 以克服现有技术中由于分类服务器可能不能及时更新而导致的无法准 确过滤 URL的问题。  In view of the above, the embodiments of the present invention provide a method and a device for filtering a uniform resource locator URL, so as to overcome the problem that the classification server may not be able to update the URL accurately due to the fact that the classification server may not be updated in time.
为实现上述目的, 本发明提供如下技术方案:  To achieve the above object, the present invention provides the following technical solutions:
一种统一资源定位符 URL的过滤方法, 包括:  A method for filtering a uniform resource locator URL, including:
接收客户端发起的 URL连接请求;  Receiving a URL connection request initiated by the client;
从预存的类别信息表中的每个 URL对应的网页类别中,查找到所述 URL 连接请求中携带的 URL对应的第一类别;  Searching, from the webpage category corresponding to each URL in the pre-stored category information table, the first category corresponding to the URL carried in the URL connection request;
判断所述第一类别是否符合预设的 URL通行策略, 所述 URL通行策略 中包含允许通过的网页类别;  Determining whether the first category meets a preset URL pass policy, and the URL pass policy includes a webpage category that is allowed to pass;
若符合, 则将所述 URL连接请求发往与其对应的服务器, 并接收所述服 务器返回的网页内容;  If yes, send the URL connection request to a server corresponding thereto, and receive the content of the webpage returned by the server;
根据所述网页内容, 确定所述 URL对应的第二类别, 判断所述第二类别 是否符合所述预设的 URL通行策略; 若第二类别符合所述预设的 URL通行 策略, 将所述网页内容发往所述客户端; 否则, 阻断所述网页内容。  Determining, according to the content of the webpage, a second category corresponding to the URL, determining whether the second category meets the preset URL traffic policy; and if the second category meets the preset URL traffic policy, The webpage content is sent to the client; otherwise, the webpage content is blocked.
一种统一资源定位符 URL的过滤装置, 包括:  A filtering device for a uniform resource locator URL, comprising:
请求接收模块, 用于接收客户端发起的 URL连接请求;  a request receiving module, configured to receive a URL connection request initiated by the client;
第一类别获取模块, 用于从预存的类别信息表中的每个 URL对应的网页 类别中, 查找到所述 URL连接请求中携带的 URL对应的第一类别;  a first category obtaining module, configured to: find, from a webpage category corresponding to each URL in the pre-stored category information table, a first category corresponding to the URL carried in the URL connection request;
通行判断模块, 用于判断所述第一类别是否符合预设的 URL通行策略, 所述 URL通行策略中包含允许通过的网页类别;  a pass-through judging module, configured to determine whether the first category meets a preset URL pass-through policy, where the URL pass-through policy includes a webpage category that is allowed to pass;
请求发送模块,用于在所述通行判断模块的判断结果为是时,将所述 URL 连接请求发往与其对应的服务器, 并接收所述服务器返回的网页内容; 类别判断模块,用于根据所述网页内容,确定所述 URL对应的第二类别, 判断所述第二类别是否符合所述预设的 URL通行策略; a request sending module, configured to: when the judgment result of the pass judgment module is YES, the URL The connection request is sent to the server corresponding thereto, and receives the content of the webpage returned by the server; the category determining module is configured to determine, according to the content of the webpage, a second category corresponding to the URL, and determine whether the second category meets the Pre-set URL traffic policy;
内容返回模块, 用于在所述类别判断模块的判断结果为是时, 将所述网 页内容发往客户端;  a content returning module, configured to send the webpage content to the client when the judgment result of the category judging module is yes;
阻断模块, 用于在所述类别判断模块的判断结果为否时, 阻断所述网页 内容。  And a blocking module, configured to block the content of the webpage when the determination result of the category determining module is negative.
本发明实施例公开了一种 URL的过滤方法和装置, 该方法能够从预存的 类别信息表中查找到 URL连接请求中携带的 URL对应的第一类别, 在所述 URL连接请求符合预设的 URL通行策略的情况下将所述 URL连接请求转发 至相应的服务器, 并根据所述服务器返回的网页内容确定所述 URL对应的第 二类别, 再判断第二类别是否符合预设的 URL通行策略, 如果第二类别符合 所述预设的 URL通行策略, 将所述网页内容发往所述客户端; 否则, 阻断所 述网页内容。 上述 URL过滤方法和装置, 能够实时确定 URL所属的类别, 保证在网页内容发生变化或类别更新不及时的情况下, 依然能够对放行的, 但实际上应该阻断的 URL连接请求及时阻断, 实现准确分类过滤的功能。 附图说明  The embodiment of the present invention discloses a URL filtering method and device, which can find a first category corresponding to a URL carried in a URL connection request from a pre-stored category information table, where the URL connection request conforms to a preset In the case of the URL pass policy, the URL connection request is forwarded to the corresponding server, and the second category corresponding to the URL is determined according to the webpage content returned by the server, and it is determined whether the second category conforms to the preset URL pass policy. And if the second category meets the preset URL pass policy, sending the webpage content to the client; otherwise, blocking the webpage content. The URL filtering method and device can determine the category to which the URL belongs in real time, and ensure that the URL connection request that is released, but should actually be blocked, is blocked in time when the content of the webpage changes or the category is not updated in time. Achieve accurate classification filtering. DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对实 施例或现有技术描述中所需要使用的附图作简单地介绍, 显而易见地, 下面 描述中的附图仅仅是本发明的实施例, 对于本领域普通技术人员来讲, 在不 付出创造性劳动的前提下, 还可以根据提供的附图获得其他的附图。  In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is an embodiment of the present invention, and those skilled in the art can obtain other drawings according to the provided drawings without any creative work.
图 1 为本发明实施例公开的 URL第一过滤方法流程图;  FIG. 1 is a flowchart of a first URL filtering method according to an embodiment of the present invention;
图 2为本发明实施例公开的确定网页内容第二类别的流程图;  2 is a flowchart of determining a second category of webpage content according to an embodiment of the present invention;
图 3 a为本发明实施例公开的 URL第二过滤方法流程图;  3a is a flowchart of a second URL filtering method according to an embodiment of the present invention;
图 3b为本发明实施例公开的 URL第三过滤方法流程图;  FIG. 3b is a flowchart of a third filtering method of a URL according to an embodiment of the present invention;
图 4为本发明实施例公开的 URL过滤装置的结构示意图; 图 5为本发明实施例公开的第一类别获取模块的结构示意图; 图 6为本发明实施例公开的类别判断模块的结构示意图; 4 is a schematic structural diagram of a URL filtering apparatus according to an embodiment of the present invention; FIG. 5 is a schematic structural diagram of a first category obtaining module according to an embodiment of the present invention; FIG. 6 is a schematic structural diagram of a category determining module according to an embodiment of the present invention;
图 7为本发明实施例公开的第二种 URL过滤装置的结构示意图; 图 8为本发明实施例公开的第三种 URL过滤装置的结构示意图; 图 9为本发明实施例公开的网关结构示意图。 具体实施方式  FIG. 7 is a schematic structural diagram of a second URL filtering apparatus according to an embodiment of the present invention; FIG. 8 is a schematic structural diagram of a third URL filtering apparatus according to an embodiment of the present invention; . detailed description
下面将结合本发明实施例中的附图, 对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而 不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有做 出创造性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。 实施例一  The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention. Embodiment 1
图 1为本发明实施例公开的 URL第一过滤方法流程图, 参见图 1所示, 所述 URL过滤方法可以包括:  FIG. 1 is a flowchart of a first method for filtering a URL according to an embodiment of the present invention. As shown in FIG. 1 , the URL filtering method may include:
步骤 101 : 接收客户端发起的 URL连接请求;  Step 101: Receive a URL connection request initiated by a client.
步骤 102: 从预存的类别信息表中的每个 URL对应的网页类别中, 查找 到所述 URL连接请求中携带的 URL对应的第一类别;  Step 102: Search, from a webpage category corresponding to each URL in the pre-stored category information table, a first category corresponding to the URL carried in the URL connection request;
可选地, 所述预存的类别信息表可以是本地緩存的或远程分类服务器上 预存的, 当然, 用户也可以将已经存在的所有 URL对应的类别信息存储在本 地, 周期性的对其进行更新, 在一定程度下满足 URL连接请求过滤工作的需 要;  Optionally, the pre-stored category information table may be locally cached or pre-stored on the remote classification server. Of course, the user may also store the category information corresponding to all existing URLs locally, and periodically update them. , to a certain extent, to meet the needs of URL connection request filtering work;
预存的类别信息表的形式可以有多种, 例如, 可以釆用记录表的形式, 也可以釆用文件的形式, 当然, 存储形式也并不仅仅局限于上述两种, 只要 是能够指示 URL与网页类别对应关系的形式, 就都可以被釆用;  The pre-stored category information table can be in various forms, for example, the form of the record table can be used, or the form of the file can be used. Of course, the storage form is not limited to the above two types, as long as the URL can be indicated. The form of the correspondence of the web page categories can be used;
当然, 如果本地緩存的类别信息表中没有所述 URL 连接请求中携带的 URL对应的类别信息, 或之前緩存的类别信息已过了老化时间, 那么就需要 从远程分类服务器获取预存的类别信息表緩存至本地, 然后根据所述类别信 息表中 URL与网页类别的对应关系,查找出所述 URL连接请求中携带的 URL 对应的第一类别; Of course, if the category information table in the locally cached category information table does not have the category information corresponding to the URL carried in the URL connection request, or the previously cached category information has passed the aging time, then Obtaining the pre-stored category information table cache from the remote classification server to the local, and then searching for the first category corresponding to the URL carried in the URL connection request according to the correspondence between the URL and the webpage category in the category information table;
步骤 103: 判断所述第一类别是否符合预设的 URL通行策略, 所述 URL 通行策略中包含允许通过的网页类别, 如果是, 进入步骤 104; 出的 URL不符合 URL通行策略; 在所述第一类别符合预设的 URL通行策略 时, 进入步骤 104执行相应的步骤, 在所述第一类别不符合预设的 URL通行 策略时, 阻断所述 URL连接请求;  Step 103: Determine whether the first category meets a preset URL pass policy, where the URL pass policy includes a webpage category that is allowed to pass, and if yes, proceed to step 104; the URL that is out does not conform to the URL pass policy; When the first category meets the preset URL pass policy, the process proceeds to step 104 to perform a corresponding step, and when the first category does not meet the preset URL pass policy, the URL connection request is blocked;
步骤 104: 将所述 URL连接请求发往与其对应的服务器, 并接收所述服 务器返回的网页内容;  Step 104: Send the URL connection request to a server corresponding thereto, and receive the content of the webpage returned by the server;
在所述 URL连接请求所要求连接的 URL所属的分类符合用户设定的通 行策略时, 相关装置就会将这个 URL连接请求发送至对应的 URL, URL在 接收到这个连接请求时, 就将所述连接请求要求访问的网页内容返回给相关 步骤 105: 根据所述网页内容, 确定所述 URL对应的第二类别; 可选地, 确定 URL第二类别的具体方法流程可参见图 2, 图 2为本发明 实施例公开的确定网页内容第二类别的流程图, 如图所示, 确定网页内容第 二类别的步骤具体可以包括:  When the URL to which the URL connection request is required to belong belongs to the traffic policy set by the user, the related device sends the URL connection request to the corresponding URL, and when the URL receives the connection request, the URL will The content of the webpage requested by the connection request is returned to the relevant step 105: determining the second category corresponding to the URL according to the content of the webpage; optionally, the specific method for determining the second category of the URL may be referred to FIG. 2, FIG. The method for determining the second category of the webpage content disclosed in the embodiment of the present invention, as shown in the figure, the step of determining the second category of the webpage content may include:
步骤 201:对所述网页内容进行解码,提取出所述网页内容的识别关键词; 这里提取出的识别关键词例如: 明星、 微博、 短信等, 当然也可以是一 些敏感字符段, 如非典型肺炎、 美国总统等等;  Step 201: Decoding the webpage content, and extracting the identification keyword of the webpage content; the identification keywords extracted here are, for example, a star, a microblog, a short message, etc., of course, some sensitive character segments, such as SARS. Type pneumonia, US president, etc.;
步骤 202: 根据本地词库列表中存储的识别关键词与网页类别的对应关 系, 确定所述 URL对应的第二类别为提取到的识别关键词对应的网页类别; 本地词库列表里对相关的一些关键词或敏感字符可以有如下分类: 休闲 娱乐类别对应的识别关键词包括: 背景、 qq空间、 祝福短信、 搞笑短信等; 新闻类别对应的识别关键词包括: 军事、 财经、 报道、 报纸等; 体育类别对 应的识别关键词包括: 街球、 篮球、 足球、 航海、 健美操等。 如果步骤 301 提取出的数据内容中包括 "足球"这一关键字,那么所述数据内容对应的 URL 的网页类别可能就被识别为体育类, 而所述 URL过滤方法的执行主体又规定 体育类的 URL是不允许访问的, 那么包括所述数据内容的网页内容就会被阻 断; Step 202: Determine, according to the correspondence between the identification keyword and the webpage category stored in the local vocabulary list, the second category corresponding to the URL is a webpage category corresponding to the extracted identification keyword; Some keywords or sensitive characters can be classified as follows: The identification keywords corresponding to the entertainment category include: background, qq space, blessing SMS, funny SMS, etc.; the identification keywords corresponding to the news category include: military, financial, reporting, newspaper, etc. ; sports category pair Key words that should be identified include: streetball, basketball, football, nautical, aerobics, etc. If the data content extracted in step 301 includes the keyword "soccer", the webpage category of the URL corresponding to the data content may be identified as a sports class, and the executive body of the URL filtering method further specifies a sports class. The URL is not allowed to be accessed, and the content of the webpage including the data content is blocked;
当然, 根据网页内容来确定 URL第二类别的方法不局限于上述流程, 例 如, 从网页内容中提取语义关系, 与预先存储的语义库中的语义关系模板进 行匹配, 将匹配中的语义关系模板对应的网页类别作为第二类别, 也可以实 现根据网页内容确定第二类别的目的。 在这里不再——列举根据网页内容确 定第二类别的具体方式,只要是能够根据网页内容确定 URL第二类别的方法, 都应属于本发明的保护范围;  Of course, the method for determining the second category of the URL according to the content of the webpage is not limited to the above process, for example, extracting the semantic relationship from the webpage content, matching the semantic relationship template in the pre-stored semantic library, and matching the semantic relationship template in the matching The corresponding webpage category is used as the second category, and the purpose of determining the second category according to the content of the webpage can also be achieved. No longer here - enumerating the specific way of determining the second category according to the content of the webpage, as long as it is a method capable of determining the second category of the URL according to the content of the webpage, it should belong to the scope of protection of the present invention;
步骤 106: 判断所述第二类别是否符合所述预设的 URL通行策略, 如果 是, 进入步骤 107, 如果否, 进入步骤 108;  Step 106: Determine whether the second category meets the preset URL pass policy, and if yes, go to step 107, if no, go to step 108;
步骤 107: 将所述网页内容发往所述客户端;  Step 107: Send the webpage content to the client;
在步骤 106判断出的第二类别属于用户设定的通行策略中被允许通过的 网页类别时, 返回的网页内容即被返回客户端, 为客户端正常提供服务; 步骤 108: 阻断所述网页内容;  When the second category determined in step 106 belongs to a webpage category that is allowed to pass in the pass policy set by the user, the returned webpage content is returned to the client to provide normal services for the client; Step 108: Block the webpage Content
在步骤 106判断出的第二类别不属于用户设定的通行策略中被允许通过 的网页类别时, 就会直接被禁止通行, 无法连接到对应的 URL上, 相关返回 的网页内容就会被阻断。  When the second category determined in step 106 does not belong to the webpage category allowed to pass in the pass policy set by the user, the second category is directly prohibited from being used, and the related webpage content is blocked. Broken.
本实施例中, 所述方法能够从预存的类别信息表中查找到 URL连接请求 中携带的 URL对应的第一类别, 并将第一类别符合预设的 URL通行策略的 URL连接请求放行, 转发至相应的服务器, 并根据所述服务器返回的网页内 容确定所述 URL对应的第二类别, 再判断第二类别是否符合预设的 URL通 行策略, 若第二类别符合所述预设的 URL通行策略, 将所述网页内容发往客 户端; 否则, 阻断所述网页内容。 通过本发明公开的 URL过滤方法, 能够实 时确定 URL所属分类, 保证在网页内容经常变化、 或分类更新不及时的情况 下, 依然能够对放行的, 但实际上应该阻断的 URL连接请求及时阻断, 实现 准确分类过滤的功能, 提高了 URL过滤的准确性。 实施例二 In this embodiment, the method can search for the first category corresponding to the URL carried in the URL connection request from the pre-stored category information table, and release and forward the URL connection request of the first category according to the preset URL access policy. Go to the corresponding server, determine the second category corresponding to the URL according to the content of the webpage returned by the server, and determine whether the second category meets the preset URL pass policy, and if the second category meets the preset URL pass The policy is to send the webpage content to the client; otherwise, the webpage content is blocked. Through the URL filtering method disclosed by the present invention, it is possible to determine the classification of the URL in real time, and ensure that the content of the webpage changes frequently, or the classification update is not timely. Underneath, it is still able to block the URL connection request that should be blocked, but it should be blocked in time to achieve accurate classification and filtering, which improves the accuracy of URL filtering. Embodiment 2
图 3a为本发明实施例公开的 URL第二过滤方法流程图,参见图 3a所示, 所述 URL过滤方法可以包括:  FIG. 3 is a flowchart of a method for filtering a second URL according to an embodiment of the present invention. As shown in FIG. 3a, the URL filtering method may include:
步骤 301: 接收客户端发起的 URL连接请求;  Step 301: Receive a URL connection request initiated by a client.
步骤 302: 从预存的类别信息表中的每个 URL对应的网页类别中, 查找 到所述 URL连接请求中携带的 URL对应的第一类别;  Step 302: Search, from a webpage category corresponding to each URL in the pre-stored category information table, a first category corresponding to the URL carried in the URL connection request;
步骤 303: 判断所述第一类别是否符合预设的 URL通行策略, 如果是, 进入步骤 306, 如果否, 进入步骤 304;  Step 303: Determine whether the first category meets the preset URL pass policy, and if yes, go to step 306, if no, go to step 304;
步骤 304: 阻断所述 URL连接请求, 进入步骤 305;  Step 304: Block the URL connection request, proceed to step 305;
步骤 305: 从已阻断的 URL连接请求中, 过滤得到其中带有预设标识的 URL连接请求, 进入步骤 306;  Step 305: From the blocked URL connection request, filtering the URL connection request with the preset identifier, and proceeding to step 306;
因为在实际情况中,一些网页的内容是动态变化的,为了避免同一个 URL 在某个时间前所属的类别不符合 URL通行策略, 而某个时间后实际所属的类 别符合 URL通行策略, 且所述 URL的分类又没有得到及时的更新, 这样使 得原本应被允许通过的 URL连接请求被阻断的情况发生, 用户可以釆用本步 骤所述的方法, 所述预设标识可以为特定的关键词, 也可以为某一个固定的 连接地址、 或者用户名等等;  Because in actual situations, the content of some web pages is dynamically changed, in order to avoid that the same URL does not meet the URL pass-through policy before a certain time, and the category actually belongs to a URL pass-through policy after a certain time, and The classification of the URL is not updated in time, so that the URL connection request that should be allowed to pass is blocked. The user may use the method described in this step, and the preset identifier may be a specific key. The word can also be a fixed connection address, or a username, etc.
步骤 306: 将所述 URL连接请求发往与其对应的服务器, 并接收所述服 务器返回的网页内容;  Step 306: Send the URL connection request to a server corresponding thereto, and receive the content of the webpage returned by the server;
步骤 307: 根据所述网页内容, 确定所述 URL对应的第二类别; 步骤 308: 判断所述第二类别是否符合预设的 URL通行策略, 如果是, 进入步骤 309, 如果否, 进入步骤 310;  Step 307: Determine, according to the content of the webpage, a second category corresponding to the URL. Step 308: Determine whether the second category meets a preset URL traffic policy. If yes, go to step 309. If no, go to step 310. ;
步骤 309: 将所述网页内容发往客户端, 进入步骤 311 ;  Step 309: Send the content of the webpage to the client, and proceed to step 311;
步骤 310: 阻断所述网页内容; 步骤 311 : 将所述预存的类别信息表中所述 URL连接请求中携带的 URL 对应的网页类别, 更新为所述第二类别。 Step 310: Block the content of the webpage; Step 311: Update the webpage category corresponding to the URL carried in the URL connection request in the pre-stored category information table to the second category.
可选地,附图 3a所示的流程也可以进行如下调整:将步骤 305与步骤 304 的顺序互换, 即在阻断一个 URL连接请求之前, 先判断所述 URL连接请求 中是否携带有预设标识, 若时, 再将该 URL连接请求阻断, 否则允许该 URL 连接请求通过,即将批处理的方式调整为实时处理,具体请参照附图 3b所示: 步骤 321 , 接收客户端发起的 URL连接请求;  Optionally, the process shown in FIG. 3a may also be adjusted as follows: the order of step 305 and step 304 is interchanged, that is, before blocking a URL connection request, it is determined whether the URL connection request carries a pre- Set the identifier, if the URL connection request is blocked, otherwise the URL connection request is allowed to pass, and the batch processing mode is adjusted to real-time processing. For details, please refer to FIG. 3b: Step 321 , receiving the client-initiated URL connection request;
步骤 322: 从预存的类别信息表中的每个 URL对应的网页类别中, 查找 到所述 URL连接请求中携带的 URL对应的第一类别;  Step 322: Find, from a webpage category corresponding to each URL in the pre-stored category information table, a first category corresponding to the URL carried in the URL connection request;
步骤 323: 判断所述第一类别是否符合预设的 URL通行策略, 如果是, 进入步骤 326, 如果否, 进入步骤 324;  Step 323: Determine whether the first category meets the preset URL pass policy, and if yes, go to step 326, if no, go to step 324;
步骤 324, 判断所述 URL连接请求中是否携带有预设标识, 若是进入步 骤 326, 否则, 进入步骤 325;  Step 324, determining whether the URL connection request carries a preset identifier, if it proceeds to step 326, otherwise, proceeds to step 325;
步骤 325, 阻断所述 URL连接请求;  Step 325, blocking the URL connection request.
步骤 326: 将所述 URL连接请求发往与其对应的服务器, 并接收所述服 务器返回的网页内容;  Step 326: Send the URL connection request to a server corresponding thereto, and receive the content of the webpage returned by the server;
步骤 327: 根据所述网页内容, 确定所述 URL对应的第二类别; 步骤 328: 判断所述第二类别是否符合预设的 URL通行策略, 如果是, 进入步骤 329, 如果否, 进入步骤 330;  Step 327: Determine, according to the content of the webpage, a second category corresponding to the URL. Step 328: Determine whether the second category meets a preset URL traffic policy. If yes, go to step 329. If no, go to step 330. ;
步骤 329: 将所述网页内容发往客户端, 进入步骤 331 ;  Step 329: Send the content of the webpage to the client, and proceed to step 331;
步骤 330: 阻断所述网页内容;  Step 330: Block the content of the webpage;
步骤 331 : 将所述预存的类别信息表中所述 URL连接请求中携带的 URL 对应的网页类别, 更新为所述第二类别。 的 URL通行策略时, 即可知本地緩存的或远程获取的分类服务器上的相关 URL第一类别是不准确的,可以将本地緩存的 URL对应的第一类别更新为根 据网页内容判断出的第二类别,或给远程分类服务器发送 URL分类变更请求, 以使分类服务器能够根据所述 URL分类变更请求做出相关后续动作。 Step 331: Update the webpage category corresponding to the URL carried in the URL connection request in the pre-stored category information table to the second category. When the URL pass policy is used, it can be known that the first category of the relevant URL on the locally cached or remotely acquired classification server is inaccurate, and the first category corresponding to the locally cached URL can be updated to the second determined according to the content of the webpage. Category, or send a URL classification change request to the remote classification server, To enable the classification server to make relevant follow-up actions according to the URL classification change request.
本实施例中, 所述 URL过滤方法能够根据预存的 URL第一类别判断客 户端发起的 URL连接请求是否符合预设的 URL通行策略, 在所述 URL连接 请求满足所述预设的 URL过滤策略的情况下将所述 URL连接请求转发至相 应的服务器, 并根据所述服务器返回的网页内容确定第二类别, 并判断第二 类别是否符合预设的 URL通行策略, 如果不符合, 就会阻断所述返回的网页 内容, 且能够避免一些应当被放行, 但由于分类更新不及时而被阻断的 URL 连接请求的情况。 通过本发明公开的 URL过滤方法, 能够实时确定 URL所 属的分类, 保证在分类更新不及时的情况下, 依然能够对放行的, 但实际上 应该阻断的 URL连接请求及时阻断, 也能够对一些没有放行的, 但是实际上 应该被放行的 URL连接请求及时放行, 实现准确分类过滤的功能。 实施例三  In this embodiment, the URL filtering method is configured to determine, according to the pre-stored URL first category, whether the URL connection request initiated by the client conforms to a preset URL pass policy, and the URL connection request satisfies the preset URL filtering policy. The URL connection request is forwarded to the corresponding server, and the second category is determined according to the content of the webpage returned by the server, and it is determined whether the second category conforms to the preset URL traffic policy. If not, the device will be blocked. The content of the returned web page is broken, and it is possible to avoid some URL connection requests that should be released but are blocked due to untimely update of the classification. Through the URL filtering method disclosed by the present invention, it is possible to determine the classification to which the URL belongs in real time, and ensure that if the classification update is not timely, the URL connection request that is released, but should actually be blocked, can be blocked in time, and can also be Some are not released, but in fact, the URL connection request should be released in time to achieve accurate classification filtering. Embodiment 3
图 4为本发明实施例公开的 URL过滤装置的结构示意图,参见图 4所示。 所述 URL过滤装置 40可以包括:  FIG. 4 is a schematic structural diagram of a URL filtering apparatus according to an embodiment of the present invention, as shown in FIG. 4. The URL filtering device 40 can include:
请求接收模块 401 , 用于接收客户端发起的 URL连接请求;  a request receiving module 401, configured to receive a URL connection request initiated by the client;
第一类别获取模块 402, 用于从预存的类别信息表中的每个 URL对应的 网页类别中, 查找到所述 URL连接请求中携带的 URL对应的第一类别; 所述第一类别获取模块 402的具体结构可以参见图 5 , 如图所示, 所述连 接判断模块 402具体可以包括:  The first category obtaining module 402 is configured to: find, from a webpage category corresponding to each URL in the pre-stored category information table, a first category corresponding to the URL carried in the URL connection request; the first category obtaining module For the specific structure of the interface 402, refer to FIG. 5. As shown in the figure, the connection determining module 402 may specifically include:
信息表获取子模块 4021 , 用于从远程分类服务器获取预存的类别信息表 緩存至本地;  The information table obtaining submodule 4021 is configured to obtain a pre-stored category information table cache from a remote classification server to the local device;
如果在本地緩存有 URL连接请求对应的 URL的类别信息表, 那么就可 以直接从本地緩存中获取类别信息表;  If the category information table of the URL corresponding to the URL connection request is locally cached, the category information table can be directly obtained from the local cache;
第一类别确定子模块 4022,用于根据所述类别信息表中 URL与网页类别 的对应关系, 查找出所述 URL连接请求中携带的 URL对应的第一类别; 通行判断模块 403 , 用于判断所述第一类别是否符合预设的 URL通行策 略, 所述 URL通行策略中包含允许通过的网页类别; The first category determining sub-module 4022 is configured to search, according to the correspondence between the URL and the webpage category in the category information table, the first category corresponding to the URL carried in the URL connection request; the pass determining module 403 is configured to determine Whether the first category conforms to a preset URL pass policy The URL pass-through policy includes a webpage category that is allowed to pass;
请求发送模块 404, 用于在所述通行判断模块 403的判断结果为是时, 将 所述 URL连接请求发往与其对应的服务器, 并接收所述服务器返回的网页内 容;  The request sending module 404 is configured to send the URL connection request to a server corresponding to the server when the determination result of the pass determining module 403 is YES, and receive the content of the webpage returned by the server;
类别判断模块 405, 用于根据所述网页内容, 确定所述 URL对应的第二 类别, 判断所述第二类别是否符合所述预设的 URL通行策略;  a class determining module 405, configured to determine, according to the content of the webpage, a second category corresponding to the URL, and determine whether the second category meets the preset URL traffic policy;
所述类别判断模块 405具体结构可以参见图 6, 如图所示, 可以包括: 网页解码子模块 4051 , 用于对所述网页内容进行解码, 提取出所述网页 内容的识别关键词;  The specific structure of the category judging module 405 can be referred to as FIG. 6. As shown in the figure, the method may include: a webpage decoding sub-module 4051, configured to decode the webpage content, and extract the identification keyword of the webpage content;
第二类别确定子模块 4052, 用于根据本地词库列表中存储的识别关键词 与网页类别的对应关系, 确定所述 URL对应的第二类别为提取到的识别关键 词对应的网页类别;  The second category determining sub-module 4052 is configured to determine, according to the correspondence between the identification keyword and the webpage category stored in the local thesaurus list, the second category corresponding to the URL is the webpage category corresponding to the extracted identification keyword;
类别判断子模块 4053 ,用于判断所述第二类别确定子模块 4052识别出的 第二类别是否符合 URL通行策略;  The class judging sub-module 4053 is configured to determine whether the second category identified by the second category determining sub-module 4052 conforms to the URL pass-through policy;
内容返回模块 406, 用于在所述类别判断模块的判断结果为是时, 将所述 网页内容发往客户端;  The content returning module 406 is configured to send the content of the webpage to the client when the determination result of the category determining module is YES;
阻断模块 407 , 用于在所述类别判断模块的判断结果为否时, 阻断所述网 页内容。  The blocking module 407 is configured to block the content of the webpage when the judgment result of the category determining module is negative.
需要说明的是, 本发明实施例的 URL过滤装置, 并不仅限于上述一种结 构, 比如, 第一类别获取模块 402和通行判断模块 403可以为集成于一体的 一个独立模块, 完成 URL连接请求的第一类别获取及判断所述第一类别是否 符合预设的通行策略; 再如, 所述内容返回模块 406和阻断模块 407可以为 一个模块。  It should be noted that the URL filtering apparatus of the embodiment of the present invention is not limited to the foregoing one structure. For example, the first category obtaining module 402 and the pass determining module 403 may be an independent module integrated into one, and complete the URL connection request. The first category obtains and determines whether the first category meets a preset traffic policy; for example, the content return module 406 and the blocking module 407 may be one module.
在其他的实施例中, URL过滤装置还可以包括分类更新模块, 用于在类 别判断模块 405判断出识别分类不符合所述预设的 URL通行策略的情况下, 将所述本地緩存的类别信息表中所述 URL连接请求中携带的 URL对应的网 页类别, 更新为所述第二类别。 进一步地, 参见图 7, 在其他的实施例中, 所述阻断模块 407还用于在所 述通行判断模块 403判断出所述第一类别不符合预设的 URL通行策略的情况 下, 阻断所述 URL连接请求; In other embodiments, the URL filtering apparatus may further include a category updating module, configured to: when the category determining module 405 determines that the identifying category does not conform to the preset URL pass policy, the locally cached category information. The webpage category corresponding to the URL carried in the URL connection request in the table is updated to the second category. Further, referring to FIG. 7, in another embodiment, the blocking module 407 is further configured to: when the traffic judging module 403 determines that the first category does not comply with a preset URL pass policy, Breaking the URL connection request;
URL过滤装置还可以包括标识过滤模 701 , 用于从阻断的 URL连接请求 中, 过滤得到带有预设标识的 URL连接请求; 并触发所述请求发送模块 404 将所述 URL连接请求发往与其对应的服务器, 并接收所述服务器返回的网页 内容; 所述类别判断模块 405根据所述网页内容, 确定所述 URL对应的第二 类别, 判断所述第二类别是否符合所述预设的 URL通行策略; 若识别分类符 合所述预设的 URL通行策略, 所述内容返回模块 406将所述网页内容发往所 述客户端; 否则, 所述阻断模块 407阻断所述网页内容。  The URL filtering apparatus may further include an identifier filtering module 701, configured to filter, by the blocked URL connection request, a URL connection request with a preset identifier, and trigger the request sending module 404 to send the URL connection request to And corresponding to the server, and receiving the webpage content returned by the server; the category determining module 405 determines a second category corresponding to the URL according to the webpage content, and determines whether the second category meets the preset The URL passing policy; if the identifying category meets the preset URL pass policy, the content returning module 406 sends the webpage content to the client; otherwise, the blocking module 407 blocks the webpage content.
进一步地, 参见图 8, 在其他实施例中, URL过滤装置还可以包括标识 判决模块 702, 用于在通行判断模块 403的判断结果为否时, 判断所述 URL 连接请求中是否带有预设标识; 若带有预设标识, 则触发所述请求发送模块 404将所述 URL连接请求发往与其对应的服务器, 并接收所述服务器返回的 网页内容; 所述类别判断模块 405根据所述网页内容, 确定所述 URL对应的 第二类别, 判断所述第二类别是否符合所述预设的 URL通行策略; 若识别分 类符合所述预设的 URL通行策略, 所述内容返回模块 406将所述网页内容发 往所述客户端; 否则, 触发阻断模块 407阻断所述网页内容;  Further, referring to FIG. 8, in other embodiments, the URL filtering apparatus may further include an identifier determining module 702, configured to determine whether the URL connection request has a preset when the determining result of the pass determining module 403 is negative. Identifying; if the preset identifier is provided, triggering the request sending module 404 to send the URL connection request to the server corresponding thereto, and receiving the webpage content returned by the server; the category determining module 405 according to the webpage Content, determining a second category corresponding to the URL, determining whether the second category meets the preset URL pass policy; if the identification category meets the preset URL pass policy, the content return module 406 The webpage content is sent to the client; otherwise, the trigger blocking module 407 blocks the webpage content;
若未带有预设标识, 则触发阻断模块 407阻断所述 URL连接请求。  If the preset identifier is not provided, the trigger blocking module 407 blocks the URL connection request.
本实施例中, 所述装置能够从预存的类别信息表中查找到 URL连接请求 对应的第一类别, 并将第一类别符合预设的 URL通行策略的 URL连接请求 放行,转发至相应的服务器,并根据所述服务器返回的网页内容确定所述 URL 对应的第二类别, 再判断第二类别是否符合预设的 URL通行策略, 若第二类 别符合所述预设的 URL通行策略, 将所述网页内容发往客户端; 否则, 阻断 所述网页内容。 通过本发明公开的 URL过滤装置, 能够实时确定 URL所属 分类, 保证在分类更新不及时的情况下, 依然能够对放行的, 但实际上应该 阻断的 URL连接请求及时阻断, 实现准确分类过滤的功能。 此外, 本发明实施例还公开一种网关, 如图 9所示, 该网关 90包含本发 明实施例公开的 URL过滤装置 40, 首先能够接收客户端发起的 URL连接请 求; 然后根据获取的 URL第一类别判断所述 URL连接请求是否符合预设的 URL通行策略; 如果是: 将所述 URL连接请求发往与其对应的服务器, 并接 收所述服务器返回的网页内容; 再确定所述网页内容的第二类别, 判断所述 第二类别是否符合预设的 URL通行策略; 如果是: 将所述返回的网页内容发 往客户端; 如果否: 阻断所述返回的网页内容, 所述网关能够实时确定 URL 所属的分类, 保证在分类更新不及时的情况下, 依然能够对放行的, 但实际 上应该阻断的 URL连接请求及时阻断, 实现准确分类过滤的功能。 本说明书中各个实施例釆用递进的方式描述, 每个实施例重点说明的都 是与其他实施例的不同之处, 各个实施例之间相同相似部分互相参见即可。 对于实施例公开的装置而言, 由于其与实施例公开的方法相对应, 所以描述 的比较简单, 相关之处参见方法部分说明即可。 In this embodiment, the device can search for the first category corresponding to the URL connection request from the pre-stored category information table, and release the URL connection request of the first category according to the preset URL access policy, and forward the request to the corresponding server. And determining, according to the content of the webpage returned by the server, the second category corresponding to the URL, and determining whether the second category meets the preset URL traffic policy, and if the second category meets the preset URL traffic policy, The webpage content is sent to the client; otherwise, the webpage content is blocked. Through the URL filtering device disclosed by the present invention, the classification of the URL can be determined in real time, and if the classification update is not timely, the URL connection request that is released, but should actually be blocked, can be blocked in time to achieve accurate classification and filtering. The function. In addition, the embodiment of the present invention further discloses a gateway. As shown in FIG. 9, the gateway 90 includes the URL filtering apparatus 40 disclosed in the embodiment of the present invention, which can first receive a URL connection request initiated by a client; Determining whether the URL connection request conforms to a preset URL pass policy; if yes: sending the URL connection request to a server corresponding thereto, and receiving webpage content returned by the server; determining the content of the webpage a second category, determining whether the second category meets a preset URL pass policy; if yes: sending the returned webpage content to the client; if not: blocking the returned webpage content, the gateway can Real-time determination of the classification to which the URL belongs, to ensure that if the classification update is not timely, the URL connection request that is released, but should actually be blocked, is blocked in time to achieve accurate classification and filtering. The various embodiments in the present specification are described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same similar parts between the various embodiments may be referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method part.
还需要说明的是, 在本文中, 诸如第一和第二等之类的关系术语仅仅用 来将一个实体或者操作与另一个实体或操作区分开来, 而不一定要求或者暗 示这些实体或操作之间存在任何这种实际的关系或者顺序。 而且, 术语 "包 括"、 "包含" 或者其任何其他变体意在涵盖非排他性的包含, 从而使得包括 一系列要素的过程、 方法、 物品或者设备不仅包括那些要素, 而且还包括没 有明确列出的其他要素, 或者是还包括为这种过程、 方法、 物品或者设备所 固有的要素。 在没有更多限制的情况下, 由语句 "包括一个 ... ... " 限定的要 素, 并不排除在包括所述要素的过程、 方法、 物品或者设备中还存在另外的 相同要素。 结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、 处理器执行的软件模块, 或者二者的结合来实施。 软件模块可以置于随机存 储器(RAM )、 内存、 只读存储器(ROM )、 电可编程 ROM、 电可擦除可编 程 ROM、 寄存器、 硬盘、 可移动磁盘、 CD-ROM、 或技术领域内所公知的任 意其它形式的存储介质中。 It should also be noted that, in this context, relational terms such as first and second, etc. are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities or operations. There is any such actual relationship or order between them. Furthermore, the terms "including", "comprising" or "comprising" or "comprising" or "includes" or "includes" or "includes" Other elements, or elements that are inherent to such a process, method, item, or device. An element defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in a process, method, article, or device that comprises the element, without further limitation. The steps of a method or algorithm described in connection with the embodiments disclosed herein can be implemented directly in hardware, a software module executed by a processor, or a combination of both. Software modules can be placed in random memory Memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form known in the art In the storage medium.
对所公开的实施例的上述说明, 使本领域专业技术人员能够实现或使用 本发明。 对这些实施例的多种修改对本领域的专业技术人员来说将是显而易 见的, 本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下, 在其它实施例中实现。 因此, 本发明将不会被限制于本文所示的这些实施例, 而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。  The above description of the disclosed embodiments enables those skilled in the art to make or use the invention. Various modifications to these embodiments are obvious to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the present invention is not intended to be limited to the embodiments shown herein, but the scope of the inventions

Claims

权 利 要 求 Rights request
1、 一种统一资源定位符 URL的过滤方法, 其特征在于, 包括: 接收客户端发起的 URL连接请求;  A method for filtering a uniform resource locator URL, comprising: receiving a URL connection request initiated by a client;
从预存的类别信息表中的每个 URL对应的网页类别中,查找到所述 URL 连接请求中携带的 URL对应的第一类别;  Searching, from the webpage category corresponding to each URL in the pre-stored category information table, the first category corresponding to the URL carried in the URL connection request;
判断所述第一类别是否符合预设的 URL通行策略, 所述 URL通行策略 中包含允许通过的网页类别;  Determining whether the first category meets a preset URL pass policy, and the URL pass policy includes a webpage category that is allowed to pass;
若符合, 则将所述 URL连接请求发往与其对应的服务器, 并接收所述服 务器返回的网页内容;  If yes, send the URL connection request to a server corresponding thereto, and receive the content of the webpage returned by the server;
根据所述网页内容, 确定所述 URL对应的第二类别, 判断所述第二类别 是否符合所述预设的 URL通行策略; 若第二类别符合所述预设的 URL通行 策略, 将所述网页内容发往所述客户端; 否则, 阻断所述网页内容。  Determining, according to the content of the webpage, a second category corresponding to the URL, determining whether the second category meets the preset URL traffic policy; and if the second category meets the preset URL traffic policy, The webpage content is sent to the client; otherwise, the webpage content is blocked.
2、 根据权利要求 1所述的方法, 其特征在于, 所述根据所述网页内容, 确定所述 URL对应的第二类别, 包括:  The method according to claim 1, wherein the determining, according to the content of the webpage, the second category corresponding to the URL comprises:
对所述网页内容进行解码, 提取出所述网页内容的识别关键词; 根据本地词库列表中存储的识别关键词与网页类别的对应关系, 确定所 述 URL对应的第二类别为提取到的识别关键词对应的网页类别。  Decoding the webpage content, extracting the identification keyword of the webpage content; determining, according to the correspondence between the identification keyword and the webpage category stored in the local thesaurus list, the second category corresponding to the URL is extracted Identify the page category corresponding to the keyword.
3、 根据权利要求 1所述的方法, 其特征在于, 所述从预存的类别信息表 中的每个 URL对应的网页类别中, 查找到所述 URL连接请求中携带的 URL 对应的第一类别, 包括:  The method according to claim 1, wherein the first category corresponding to the URL carried in the URL connection request is found in the webpage category corresponding to each URL in the pre-stored category information table. , including:
从远程分类服务器获取预存的类别信息表緩存至本地;  Obtaining a pre-stored category information table cache from a remote classification server to the local;
根据所述类别信息表中 URL与网页类别的对应关系, 查找出所述 URL 连接请求中携带的 URL对应的第一类别。  And determining, according to the correspondence between the URL and the webpage category in the category information table, the first category corresponding to the URL carried in the URL connection request.
4、 根据权利要求 1所述的方法, 其特征在于, 若识别分类不符合所述预 设的 URL通行策略, 还包括:  The method according to claim 1, wherein if the identification classification does not comply with the preset URL traffic policy, the method further includes:
将所述预存的类别信息表中所述 URL连接请求中携带的 URL对应的网 页类别, 更新为所述第二类别。 a network corresponding to the URL carried in the URL connection request in the pre-stored category information table The page category is updated to the second category.
5、 根据权利要求 1-4任一项所述的方法, 其特征在于, 若所述第一类别 不符合预设的 URL通行策略, 还包括:  The method according to any one of claims 1-4, wherein if the first category does not comply with a preset URL pass policy, the method further includes:
阻断所述 URL连接请求;  Blocking the URL connection request;
从阻断的 URL连接请求中, 过滤得到带有预设标识的 URL连接请求, 将过滤得到的 URL连接请求发往与其对应的服务器, 并接收所述服务器 返回的网页内容;  From the blocked URL connection request, the URL connection request with the preset identifier is filtered, and the filtered URL connection request is sent to the corresponding server, and the webpage content returned by the server is received;
根据返回的网页内容, 确定所述 URL对应的第二类别, 判断所述第二类 别是否符合所述预设的 URL通行策略; 若识别分类符合所述预设的 URL通 行策略, 将所述网页内容发往所述客户端; 否则, 阻断所述网页内容。  Determining, according to the returned webpage content, the second category corresponding to the URL, determining whether the second category meets the preset URL traffic policy; and if the identification category meets the preset URL traffic policy, the webpage is The content is sent to the client; otherwise, the content of the webpage is blocked.
6、 根据权利要求 1-4任一项所述的方法, 其特征在于, 若所述第一类别 不符合预设的 URL通行策略, 还包括:  The method according to any one of claims 1-4, wherein, if the first category does not comply with a preset URL pass policy, the method further includes:
判断所述 URL连接请求中是否带有预设标识;  Determining whether the URL connection request carries a preset identifier;
若带有预设标识, 则将所述 URL连接请求发往与其对应的服务器, 并接 收所述服务器返回的网页内容; 根据所述网页内容, 确定所述 URL对应的第 二类别, 判断所述第二类别是否符合所述预设的 URL通行策略; 若识别分类 符合所述预设的 URL通行策略, 将所述网页内容发往所述客户端; 否则, 阻 断所述网页内容;  If the preset identifier is provided, the URL connection request is sent to the server corresponding thereto, and the webpage content returned by the server is received, and the second category corresponding to the URL is determined according to the webpage content, and the Whether the second category meets the preset URL pass policy; if the recognition category meets the preset URL pass policy, the webpage content is sent to the client; otherwise, the webpage content is blocked;
若未带有预设标识, 则阻断所述 URL连接请求。  If the preset identifier is not provided, the URL connection request is blocked.
7、 一种统一资源定位符 URL的过滤装置, 其特征在于, 包括: 请求接收模块, 用于接收客户端发起的 URL连接请求;  A filtering device for a uniform resource locator URL, comprising: a request receiving module, configured to receive a URL connection request initiated by a client;
第一类别获取模块, 用于从预存的类别信息表中的每个 URL对应的网页 类别中, 查找到所述 URL连接请求中携带的 URL对应的第一类别;  a first category obtaining module, configured to: find, from a webpage category corresponding to each URL in the pre-stored category information table, a first category corresponding to the URL carried in the URL connection request;
通行判断模块, 用于判断所述第一类别是否符合预设的 URL通行策略, 所述 URL通行策略中包含允许通过的网页类别;  a pass-through judging module, configured to determine whether the first category meets a preset URL pass-through policy, where the URL pass-through policy includes a webpage category that is allowed to pass;
请求发送模块,用于在所述通行判断模块的判断结果为是时,将所述 URL 连接请求发往与其对应的服务器, 并接收所述服务器返回的网页内容; 类别判断模块,用于根据所述网页内容,确定所述 URL对应的第二类别, 判断所述第二类别是否符合所述预设的 URL通行策略; a request sending module, configured to send the URL connection request to a server corresponding thereto when the judgment result of the pass judgment module is YES, and receive the webpage content returned by the server; a category determining module, configured to determine, according to the content of the webpage, a second category corresponding to the URL, and determine whether the second category meets the preset URL traffic policy;
内容返回模块, 用于在所述类别判断模块的判断结果为是时, 将所述网 页内容发往客户端;  a content returning module, configured to send the webpage content to the client when the judgment result of the category judging module is yes;
阻断模块, 用于在所述类别判断模块的判断结果为否时, 阻断所述网页 内容。  And a blocking module, configured to block the content of the webpage when the determination result of the category determining module is negative.
8、 根据权利要求 7所述的装置, 其特征在于, 所述类别判断模块包括: 网页解码子模块, 用于对所述网页内容进行解码, 提取出所述网页内容 的识别关键词;  The apparatus according to claim 7, wherein the category determining module comprises: a webpage decoding sub-module, configured to decode the webpage content, and extract an identification keyword of the webpage content;
第二类别确定子模块, 用于根据本地词库列表中存储的识别关键词与网 页类别的对应关系, 确定所述 URL对应的第二类别为提取到的识别关键词对 应的网页类别;  a second category determining submodule, configured to determine, according to the correspondence between the identification keyword and the webpage category stored in the local thesaurus list, the second category corresponding to the URL is the webpage category corresponding to the extracted identification keyword;
类别判断子模块, 用于判断所述第二类别确定子模块识别出的第二类别 是否符合 URL通行策略。  The class judging sub-module is configured to determine whether the second category identified by the second category determining sub-module conforms to the URL pass-through policy.
9、 根据权利要求 7所述的装置, 其特征在于, 所述第一类别获取模块具 体包括:  9. The device according to claim 7, wherein the first category obtaining module comprises:
信息表获取子模块, 用于从远程分类服务器获取预存的类别信息表緩存 至本地;  a information table obtaining submodule, configured to obtain a pre-stored category information table cache from a remote classification server to the local device;
第一类别确定子模块, 用于根据所述类别信息表中 URL与网页类别的对 应关系, 查找出所述 URL连接请求中携带的 URL对应的第一类别。  The first category determining sub-module is configured to find a first category corresponding to the URL carried in the URL connection request according to the correspondence between the URL and the webpage category in the category information table.
10、 根据权利要求 7所述的装置, 其特征在于, 还包括:  The device according to claim 7, further comprising:
分类更新模块, 用于在类别判断模块判断出识别分类不符合所述预设的 URL通行策略的情况下,将所述本地緩存的类别信息表中所述 URL连接请求 中携带的 URL对应的网页类别, 更新为所述第二类别。  a classification update module, configured to: when the category determination module determines that the identification category does not meet the preset URL pass policy, the webpage corresponding to the URL carried in the URL connection request in the locally cached category information table Category, updated to the second category.
11、 根据权利要求 7-10任一项所述的装置, 其特征在于, 所述阻断模块 还用于:  The device according to any one of claims 7 to 10, wherein the blocking module is further configured to:
在所述通行判断模块判断出所述第一类别不符合预设的 URL通行策略的 情况下, 阻断所述 URL连接请求; The passage judging module judges that the first category does not conform to a preset URL pass policy In the case, blocking the URL connection request;
还包括:  Also includes:
标识过滤模块, 用于从阻断的 URL连接请求中, 过滤得到带有预设标识 的 URL连接请求;  An identifier filtering module, configured to filter, by using a blocked URL connection request, a URL connection request with a preset identifier;
对于过滤得到的每个 URL连接请求, 所述请求发送模块将所述 URL连 接请求发往与其对应的服务器, 并接收所述服务器返回的网页内容;  For each URL connection request obtained by the filtering, the request sending module sends the URL connection request to a server corresponding thereto, and receives the webpage content returned by the server;
所述类别判断模块根据返回的网页内容,确定所述 URL对应的第二类别, 判断所述第二类别是否符合所述预设的 URL通行策略;  Determining, by the category determining module, the second category corresponding to the URL according to the content of the returned webpage, and determining whether the second category meets the preset URL passing policy;
若识别分类符合所述预设的 URL通行策略, 所述内容返回模块将所述网 页内容发往所述客户端; 否则, 所述阻断模块阻断所述网页内容。  If the identification category meets the preset URL pass policy, the content return module sends the web page content to the client; otherwise, the blocking module blocks the webpage content.
12、 根据权利要求 7-10任一项所述的装置, 其特征在于, 还包括: 标识判决模块, 用于在通行判断模块的判断结果为否时, 判断所述 URL 连接请求中是否带有预设标识; 若带有预设标识, 则触发所述请求发送模块 将所述 URL连接请求发往与其对应的服务器, 并接收所述服务器返回的网页 内容;所述类别判断模块根据所述网页内容,确定所述 URL对应的第二类别, 判断所述第二类别是否符合所述预设的 URL通行策略; 若识别分类符合所述 预设的 URL通行策略, 所述内容返回模块将所述网页内容发往所述客户端; 否则, 触发阻断模块阻断所述网页内容;  The device according to any one of claims 7 to 10, further comprising: an identifier determining module, configured to determine whether the URL connection request is included when the judgment result of the pass judgment module is negative Presetting the identifier; if the preset identifier is provided, triggering the request sending module to send the URL connection request to the server corresponding thereto, and receiving the webpage content returned by the server; the category determining module according to the webpage Content, determining a second category corresponding to the URL, determining whether the second category meets the preset URL pass policy; if the recognition category meets the preset URL pass policy, the content return module will Sending the content of the webpage to the client; otherwise, triggering the blocking module to block the content of the webpage;
若未带有预设标识, 则触发阻断模块阻断所述 URL连接请求。  If the preset identifier is not provided, the trigger blocking module blocks the URL connection request.
PCT/CN2012/081548 2011-12-31 2012-09-18 Method and device for filtering uniform resource locator (url) WO2013097494A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/307,014 US9331981B2 (en) 2011-12-31 2014-06-17 Method and apparatus for filtering URL

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201110459686 2011-12-31
CN201110459686.7 2011-12-31
CN201210022574.XA CN102624703B (en) 2011-12-31 2012-02-01 Method and device for filtering uniform resource locators (URLs)
CN201210022574.X 2012-02-01

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/307,014 Continuation US9331981B2 (en) 2011-12-31 2014-06-17 Method and apparatus for filtering URL

Publications (1)

Publication Number Publication Date
WO2013097494A1 true WO2013097494A1 (en) 2013-07-04

Family

ID=46564388

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/081548 WO2013097494A1 (en) 2011-12-31 2012-09-18 Method and device for filtering uniform resource locator (url)

Country Status (3)

Country Link
US (1) US9331981B2 (en)
CN (1) CN102624703B (en)
WO (1) WO2013097494A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114244654A (en) * 2021-12-20 2022-03-25 中国平安财产保险股份有限公司 URL forwarding method, device, equipment and computer storage medium

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102624703B (en) 2011-12-31 2015-01-21 华为数字技术(成都)有限公司 Method and device for filtering uniform resource locators (URLs)
CN102760162A (en) * 2012-06-11 2012-10-31 北京搜狗信息服务有限公司 Method and device for revealing and acquiring download link
CN102819591B (en) * 2012-08-07 2016-04-06 北京网康科技有限公司 A kind of content-based Web page classification method and system
CN102999590B (en) * 2012-11-16 2015-07-29 北京奇虎科技有限公司 Identify the method and system of official website
CN103198091B (en) * 2012-12-04 2016-12-21 网易(杭州)网络有限公司 The processing method of a kind of online data based on user behavior request and equipment
US9332291B1 (en) * 2012-12-27 2016-05-03 Google Inc. Enforcing publisher content item block requests
CN104079528A (en) * 2013-03-26 2014-10-01 北大方正集团有限公司 Method and system of safety protection of Web application
CN103366019B (en) * 2013-08-06 2016-09-28 飞天诚信科技股份有限公司 A kind of webpage hold-up interception method based on iOS device and equipment
KR20150078130A (en) * 2013-12-30 2015-07-08 삼성전자주식회사 Method and system for controlling content
CN103995773B (en) * 2014-02-28 2019-11-22 上海斐讯数据通信技术有限公司 A kind of automatic test approach of url filtering function
CN105591997B (en) * 2014-10-20 2019-04-09 杭州迪普科技股份有限公司 A kind of URL classification filter method and device
US10021102B2 (en) * 2014-10-31 2018-07-10 Aruba Networks, Inc. Leak-proof classification for an application session
DE102015007876A1 (en) 2015-06-22 2017-01-05 Eblocker Gmbh Network control device
CN105704120B (en) * 2016-01-05 2019-03-19 中云网安科技(北京)有限公司 A method of the secure access network based on self study form
US10034263B2 (en) 2016-06-30 2018-07-24 Hewlett Packard Enterprise Development Lp Determining scale for received signal strength indicators and coordinate information
CN108122090A (en) * 2016-11-30 2018-06-05 北京国双科技有限公司 A kind of Working information processing method and server
CN109726347A (en) * 2018-12-29 2019-05-07 杭州迪普科技股份有限公司 Network request automatic classification method and relevant device
CN110311983B (en) * 2019-07-09 2021-04-06 北京字节跳动网络技术有限公司 Service request processing method, device and system, electronic equipment and storage medium
CN112861031B (en) * 2019-11-27 2024-04-02 北京金山云网络技术有限公司 URL refreshing method, device and equipment in CDN and CDN node
US11595352B2 (en) * 2020-12-21 2023-02-28 Microsoft Technology Licensing, Llc Performing transport layer security (TLS) termination using categories of web categorization

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040107177A1 (en) * 2002-06-17 2004-06-03 Covill Bruce Elliott Automated content filter and URL translation for dynamically generated web documents
CN101163161A (en) * 2007-11-07 2008-04-16 福建星网锐捷网络有限公司 United resource localizer address filtering method and intermediate transmission equipment
CN101261644A (en) * 2008-04-30 2008-09-10 杭州华三通信技术有限公司 Method and device for accessing united resource positioning symbol database
CN102624703A (en) * 2011-12-31 2012-08-01 成都市华为赛门铁克科技有限公司 Method and device for filtering uniform resource locators (URLs)

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6266664B1 (en) * 1997-10-01 2001-07-24 Rulespace, Inc. Method for scanning, analyzing and rating digital information content
US6065055A (en) * 1998-04-20 2000-05-16 Hughes; Patrick Alan Inappropriate site management software
US6772214B1 (en) * 2000-04-27 2004-08-03 Novell, Inc. System and method for filtering of web-based content stored on a proxy cache server
US20030014659A1 (en) * 2001-07-16 2003-01-16 Koninklijke Philips Electronics N.V. Personalized filter for Web browsing
US20030163731A1 (en) * 2002-02-28 2003-08-28 David Wigley Method, system and software product for restricting access to network accessible digital information
US7383248B2 (en) * 2002-12-12 2008-06-03 Jay Chieh Chen Hyperlink park and search
US20080209057A1 (en) * 2006-09-28 2008-08-28 Paul Martini System and Method for Improved Internet Content Filtering
CN101035128B (en) * 2007-04-18 2010-04-21 大连理工大学 Three-folded webpage text content recognition and filtering method based on the Chinese punctuation
CN101350810A (en) 2007-07-20 2009-01-21 莱克斯信息技术(北京)有限公司 Url filtrating base on authentication user set
AU2009267107A1 (en) * 2008-06-30 2010-01-07 Websense, Inc. System and method for dynamic and real-time categorization of webpages
US20100318681A1 (en) * 2009-06-12 2010-12-16 Barracuda Networks, Inc Protocol-independent, mobile, web filter system provisioning dns triage, uri scanner, and query proxy services
US20110289434A1 (en) * 2010-05-20 2011-11-24 Barracuda Networks, Inc. Certified URL checking, caching, and categorization service
CN102271331B (en) * 2010-06-02 2014-12-10 中国移动通信集团广东有限公司 Method and system for detecting reliability of service provider (SP) site
US8732857B2 (en) * 2010-12-23 2014-05-20 Sosvia, Inc. Client-side access control of electronic content
CN102073722A (en) 2011-01-11 2011-05-25 吕晓东 URL (Uniform Resource Locator) cloud publishing system
CN102137121A (en) * 2011-05-09 2011-07-27 北京艾普优计算机系统有限公司 Method, device and system for processing data
CN102185859A (en) * 2011-05-09 2011-09-14 北京艾普优计算机系统有限公司 Computer system and data interaction method
US20130091580A1 (en) * 2011-10-11 2013-04-11 Mcafee, Inc. Detect and Prevent Illegal Consumption of Content on the Internet

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040107177A1 (en) * 2002-06-17 2004-06-03 Covill Bruce Elliott Automated content filter and URL translation for dynamically generated web documents
CN101163161A (en) * 2007-11-07 2008-04-16 福建星网锐捷网络有限公司 United resource localizer address filtering method and intermediate transmission equipment
CN101261644A (en) * 2008-04-30 2008-09-10 杭州华三通信技术有限公司 Method and device for accessing united resource positioning symbol database
CN102624703A (en) * 2011-12-31 2012-08-01 成都市华为赛门铁克科技有限公司 Method and device for filtering uniform resource locators (URLs)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114244654A (en) * 2021-12-20 2022-03-25 中国平安财产保险股份有限公司 URL forwarding method, device, equipment and computer storage medium
CN114244654B (en) * 2021-12-20 2023-09-26 中国平安财产保险股份有限公司 URL forwarding method, device, equipment and computer storage medium

Also Published As

Publication number Publication date
CN102624703B (en) 2015-01-21
CN102624703A (en) 2012-08-01
US9331981B2 (en) 2016-05-03
US20140298445A1 (en) 2014-10-02

Similar Documents

Publication Publication Date Title
WO2013097494A1 (en) Method and device for filtering uniform resource locator (url)
US9544295B2 (en) Login method for client application and corresponding server
US9148332B2 (en) Content delivery network
WO2021012553A1 (en) Data processing method and related device
US20150074289A1 (en) Detecting error pages by analyzing server redirects
AU2017389607B2 (en) Method and apparatus for updating search cache
KR102090982B1 (en) How to identify malicious websites, devices and computer storage media
CN110943961A (en) Data processing method, device and storage medium
WO2019109529A1 (en) Webpage identification method, device, computer apparatus, and computer storage medium
WO2013181972A1 (en) Method and device for identifying network access behaviour
CN107301215B (en) Search result caching method and device and search method and device
US20150222649A1 (en) Method and apparatus for processing a webpage
CN109743309B (en) Illegal request identification method and device and electronic equipment
CN114900546B (en) Data processing method, device and equipment and readable storage medium
WO2012062107A1 (en) Method and apparatus for data processing based on surfing behavior of mobile telephone user
US20160034589A1 (en) Method and system for search term whitelist expansion
CN109495471B (en) Method, device and equipment for judging WEB attack result and readable storage medium
CN111382206A (en) Data storage method and device
WO2018153236A1 (en) Method and apparatus for accelerating dynamic resource access based on api request, medium, and device
CN108664493B (en) Method and device for counting validity of URL (Uniform resource locator), electronic equipment and storage medium
US20090055931A1 (en) Device and method for detecting vulnerability of web server using multiple search engines
WO2014164247A2 (en) System and method to allow a domain name server to process a natural language query and determine context
US20080005252A1 (en) Searching users in heterogeneous instant messaging services
US10313127B1 (en) Method and system for detecting and alerting users of device fingerprinting attempts
CN111078697A (en) Data storage method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12861067

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12861067

Country of ref document: EP

Kind code of ref document: A1