US20190266210A1 - Using cache and bloom filters for url lookups - Google Patents
Using cache and bloom filters for url lookups Download PDFInfo
- Publication number
- US20190266210A1 US20190266210A1 US16/286,164 US201916286164A US2019266210A1 US 20190266210 A1 US20190266210 A1 US 20190266210A1 US 201916286164 A US201916286164 A US 201916286164A US 2019266210 A1 US2019266210 A1 US 2019266210A1
- Authority
- US
- United States
- Prior art keywords
- url
- query
- bloom filter
- received
- response
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000004044 response Effects 0.000 claims abstract description 14
- 230000009466 transformation Effects 0.000 claims abstract description 8
- 238000000034 method Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 9
- 238000011156 evaluation Methods 0.000 description 4
- 230000006855 networking Effects 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 239000004557 technical material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9566—URL specific, e.g. using aliases, detecting broken or misspelled links
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/957—Browsing optimisation, e.g. caching or content distillation
- G06F16/9574—Browsing optimisation, e.g. caching or content distillation of access to content, e.g. by caching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9017—Indexing; Data structures therefor; Storage structures using directory or table look-up
Definitions
- Firewalls and other security devices typically enforce policies against network transmissions based on a set of rules.
- the rules may be based on uniform resource locator (URL) information, such as by preventing a user from accessing a specific URL (e.g., denying access to http://www.example.com), or by preventing a user from accessing a category of the URL (e.g., denying access to sites classified as “social networking” sites or “pornographic” sites).
- URL uniform resource locator
- FIG. 1 illustrates an embodiment of an environment in which policies that include URL information are enforced.
- FIG. 2 illustrates an embodiment of a policy enforcement appliance.
- FIG. 3 illustrates an embodiment of a policy enforcement appliance.
- FIG. 4A illustrates an example of a URL.
- FIG. 4B illustrates a portion of a URL.
- FIG. 4C illustrates a portion of a URL.
- FIG. 4D illustrates a portion of a URL.
- FIG. 5A illustrates a representation of processing performed by a policy enforcement appliance in some embodiments.
- FIG. 5B illustrates a representation of processing performed by a policy enforcement appliance in some embodiments.
- FIG. 5C illustrates a representation of processing performed by a policy enforcement appliance in some embodiments.
- FIG. 6 illustrates an embodiment of a process for enforcing a policy based at least in part on URL information.
- the invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor.
- these implementations, or any other form that the invention may take, may be referred to as techniques.
- the order of the steps of disclosed processes may be altered within the scope of the invention.
- a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task.
- the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
- FIG. 1 illustrates an embodiment of an environment in which policies that include URL information are enforced.
- clients 104 and 106 are a laptop computer and desktop computer, respectively, present in an enterprise network 108 .
- Policy enforcement appliance 102 (also referred to herein as “appliance 102 ”) is configured to enforce policies regarding communications between clients, such as clients 104 and 106 , and nodes outside of enterprise network 108 (e.g., reachable via external network 110 ).
- One example of a policy is a rule prohibiting any access to site 112 (a pornographic website) by any client inside network 108 .
- Another example of a policy is a rule prohibiting access to social networking site 114 by clients between the hours of 9 am and 6 pm.
- policy enforcement appliance 102 is also configured to enforce policies with respect to traffic that stays within enterprise network 108 .
- policy enforcement appliance 102 can be implemented in a variety of ways. Specifically, policy enforcement appliance 102 can be a dedicated device or set of devices. The functionality provided by appliance 102 can also be integrated into or executed as software on a general purpose computer, a computer server, a gateway, and/or a network/routing device. Further, whenever appliance 102 is described as performing a task, a single component, a subset of components, or all components of appliance 102 may cooperate to perform the task. Similarly, whenever a component of appliance 102 is described as performing a task, a subcomponent may perform the task and/or the component may perform the task in conjunction with other components. In various embodiments, portions of appliance 102 are provided by one or more third parties.
- appliance 102 may be omitted and the techniques described herein adapted accordingly. Similarly, additional logical components/features can be added to system 102 as applicable. As one example, multiple bloom filters may be included.
- FIG. 2 illustrates an embodiment of a policy enforcement appliance.
- appliance 102 includes a high performance multi-core CPU 202 and RAM 204 .
- Appliance 102 also includes a storage 210 (such as one or more hard disks), which is used to store policy and other configuration information, as well as URL information.
- Data appliance 102 can also include one or more optional hardware accelerators.
- data appliance 102 can include a cryptographic engine 206 configured to perform encryption and decryption operations, and one or more FPGAs 208 configured to perform matching, act as network processors, and/or perform other tasks.
- FIG. 3 illustrates an embodiment of a policy enforcement appliance.
- the functionality of policy enforcement appliance 102 is implemented in a firewall.
- appliance 102 includes a management plane 302 and a data plane 304 .
- the management plane is responsible for managing user interactions, such as by providing a user interface for configuring policies ( 318 ) and viewing log data.
- the data plane is responsible for managing data, such as by performing packet processing (e.g., to extract URLs) and session handling.
- a scheduler is responsible for managing the scheduling of requests (e.g., as presented by data plane 304 to management plane 302 , or as presented by management plane 302 to URL server 316 ).
- firewall 102 One task performed by the firewall is URL filtering.
- network 108 belongs to a company, “ACME Corporation.”
- Specified in appliance 102 are a set of policies 318 , some of which govern the types of websites that employees may access, and under what conditions.
- included in the firewall is a policy that permits employees to access news-related websites.
- Another policy included in the firewall prohibits, at all times, employees from accessing pornographic websites.
- Also included in the firewall is a database of URLs and associated categories. Other information can also be associated with the URLs in the database instead of or in addition to category information, and that other information can be used in conjunction with policy enforcement.
- the database is provided by a third party, such as through a subscription service.
- a transformation is applied to the URLs prior to storage.
- MD5 hashes of URLs can be stored in database 312 , rather than the URLs themselves.
- the URLs stored in database 312 represent the top n URLs for which access is most likely to be sought by users of client devices, such as client 104 , where n can be configured based on the computing and other resources available to appliance 102 .
- database 312 includes 20 million URLs and is stored in storage 210 .
- a bloom filter 308 is compiled from the contents of database 312 and is loaded into RAM 204 .
- the bloom filter is compiled as a bitmask.
- bloom filter 308 is recompiled.
- various caches 306 , 312 , and 314 also loaded into RAM 204 .
- all or some of caches 306 , 312 , and 314 are omitted from appliance 102 and the processing described herein is adapted accordingly. Additional detail regarding components shown in FIG. 3 will be provided below.
- Policy enforcement appliance 102 When a user of client 104 (an employee referred to herein as “Alice”) attempts to engage in activities such as web surfing, communications from and to the client pass through policy enforcement appliance 102 .
- Alice an employee referred to herein as “Alice”
- communications from and to the client pass through policy enforcement appliance 102 .
- Appliance 102 is configured to evaluate the URL of the site Alice would like to visit and determine whether access should be permitted.
- FIG. 4A illustrates an example of a URL ( 402 ) and FIGS. 4B-4D illustrate portions of URL 402 .
- FIG. 4B illustrates URL 402 up through the first subpath
- FIG. 4C illustrates the hostname portion of URL 402
- FIG. 4D illustrates the domain portion of URL 402 .
- Portions 404 - 408 are also referred to herein as “URLs 404 - 408 .”
- a match against the most specific portion of URL 402 e.g., URL 404
- fallbacks to more generalized versions of the URL e.g., URLs 406 , and 408 , respectively).
- the URL is evaluated by appliance 102 as follows.
- the data plane consults cache 306 for the presence of each of URLs 404 , 406 , and 408 , in order, until a match is found. If one of the URLs is present, the associated category that is also stored in cache 306 is used to enforce any applicable policies 318 . If none of the URLs are present in cache 306 , a temporary entry is inserted into cache 306 indicating that the URL is being resolved.
- a URL being resolved is assigned a temporary category of “UNRESOLVED.”
- an entry for each of URLs 404 - 408 (and a corresponding status of “UNRESOLVED”) is included in cache 306 .
- only one entry is made, such as an entry for URL 404 .
- Additional requests received by appliance 102 for access to URL 402 (or portions thereof) will be queued pending the resolution.
- a timeout condition is placed on UNRESOLVED entries included in cache 306 , such that if the entry is not updated within a specified period of time, the entry is removed.
- URL 404 is checked first, as follows: URL 404 is transformed as applicable (e.g., an MD5 hash of URL 404 is computed). For the remainder of the discussion of this example, no distinction will be made between the URL and the MD5 (or other transformation) of the URL, to aid in clarity. It is to be assumed that if database 312 stores MD5 hashes, the queries performed against it (and the corresponding bloom filter and queries against the bloom filter) will be performed using MD5 (or other applicable) transformations of URLs.
- a REJECT response if received from bloom filter 308 for URL 404 , indicates with 100% confidence that URL 404 is not present in database 312 .
- An ACCEPT response indicates that URL 404 is present in database 312 , subject to a given false positive rate.
- the desired false positive rate of bloom filter 308 is configurable and is in some embodiments set at 10%, meaning that an ACCEPT response indicates, with 90% confidence, that the URL is present in database 312 . Additional detail of how elements 308 , 310 , and 312 are used to process URLs is provided with reference to FIGS. 5A-5C .
- FIGS. 5A-5C illustrate representations of processing performed by a policy enforcement appliance in some embodiments.
- URL 408 is present in database 312 (i.e., an MD5 hash of URL 408 is present), while URLs 404 and 406 are not.
- bloom filter 308 will indicate a false positive for URL 404 .
- a match is performed using URL 404 ( 502 ).
- Bloom filter 308 reports an “accept,” ( 504 ) meaning that there is a 90% chance that URL 404 is present in database 312 .
- Cache 310 is evaluated for the presence of URL 404 ( 506 ).
- URL 404 is not present in the cache ( 508 ), and so a query of database 312 is performed using URL 404 ( 510 ).
- the ACCEPTance of URL 404 by the bloom filter was a false positive.
- URL 404 is not present in database 312 . Accordingly, the query of database 312 for URL 404 will also fail ( 512 ).
- a match against bloom filter 308 for URL 406 is performed ( 532 ).
- the bloom filter reports a REJECT ( 534 ), indicating with 100% confidence that the URL is not present in database 312 . There is accordingly no need to perform lookups against cache 310 or database 312 using URL 406 .
- a match against bloom filter 308 for URL 408 is performed ( 572 ).
- the bloom filter reports an ACCEPT, ( 574 ) meaning that there is a 90% chance that URL 408 is present in database 312 .
- Cache 310 is evaluated for the presence of URL 408 ( 576 ).
- URL 408 is not present in the cache ( 578 ), and so a query of database 312 is performed using URL 408 ( 580 ).
- URL 508 is present in database 312 and so the corresponding category NEWS is returned ( 582 ) and ultimately provided to data plane 304 , which will update the entry in cache 306 by changing the UNRESOLVED category to NEWS.
- only the finally matched URL ( 408 ) is updated in cache 306 .
- entries for each of URLs 404 , 406 , and 408 are updated in cache 306 with a NEWS category. The category will be used by the firewall to enforce any applicable rules.
- Cache 310 is also updated to include the returned category and URL 408 (i.e., its MD5 hash).
- cache 310 is also updated when result 512 is returned.
- URL 404 is included in cache 310 along with a category of UNKNOWN.
- UNKNOWN category included in cache 310 for URL 404 is modified to match the result.
- URL server 316 is made available by the provider of the contents of database 312 , and contains URL information that supplements the information included in database 312 (e.g., by including many millions of additional URLs and corresponding categories). URL server 316 can also be under the control of the owner of appliance 102 or any other appropriate party. In various embodiments, a bloom filter corresponding to the data stored by URL server 316 is included in appliance 102 .
- URLs 404 - 408 are also absent from URL server 316 , a category of UNKNOWN will be returned and appropriate policies applied, based on the category, such as by blocking access to URL 402 .
- Cache 306 can also be updated by switching the temporary category of UNRESOLVED to UNKNOWN.
- cache 314 is updated based on results returned by URL server 316 .
- URLs with UNKNOWN categorization have a timeout, thus allowing for resolution of the categorization during a subsequent request.
- FIG. 6 illustrates an embodiment of a process for enforcing a policy based at least in part on URL information.
- the process shown in FIG. 6 is performed by policy enforcement appliance 102 and, in various embodiments, multiple instances of the process shown in FIG. 6 or portions thereof are performed in parallel on appliance 102 , as applicable.
- the process begins at 602 when a URL is received.
- a URL is received when data plane 304 extracts a URL out of a packet received from client 104 .
- the URL is matched against a bloom filter.
- what is matched is a portion of the URL (e.g., portions 404 - 408 of URL 402 ), and/or a transformation of the URL (e.g., an MD5 hash of the URL or URL portion).
- URL 404 is matched against bloom filter 308 (as illustrated in FIG. 5A at 502 ).
- URL 408 is matched against bloom filter 308 (as illustrated in FIG. 5C at 572 ).
- a first query is performed, based on a result of the match.
- query 510 is performed.
- a policy is enforced based at least in part on a category received as a result of a second query.
- a policy is enforced based on the receipt of the “NEWS” category.
- the first and second query may be different (e.g., where the first query is query 576 and the second query is 580 ; where the first query is query 506 or query 510 and the second query is query 580 ; or where the first query is performed against database 312 and the second query is performed against cache 314 or remote URL server 316 ).
- the first and second query may be the same (e.g., where the first and second queries are both query 580 ).
Abstract
Description
- This application is a continuation of co-pending U.S. patent application Ser. No. 13/111,131 entitled USING CACHE AND BLOOM FILTERS FOR URL LOOKUPS filed May 19, 2011 which is incorporated herein by reference for all purposes.
- Firewalls and other security devices typically enforce policies against network transmissions based on a set of rules. In some cases, the rules may be based on uniform resource locator (URL) information, such as by preventing a user from accessing a specific URL (e.g., denying access to http://www.example.com), or by preventing a user from accessing a category of the URL (e.g., denying access to sites classified as “social networking” sites or “pornographic” sites). Unfortunately, given the sheer volume of URLs in existence, it can be difficult to efficiently match rules that make use of URL information.
- Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.
-
FIG. 1 illustrates an embodiment of an environment in which policies that include URL information are enforced. -
FIG. 2 illustrates an embodiment of a policy enforcement appliance. -
FIG. 3 illustrates an embodiment of a policy enforcement appliance. -
FIG. 4A illustrates an example of a URL. -
FIG. 4B illustrates a portion of a URL. -
FIG. 4C illustrates a portion of a URL. -
FIG. 4D illustrates a portion of a URL. -
FIG. 5A illustrates a representation of processing performed by a policy enforcement appliance in some embodiments. -
FIG. 5B illustrates a representation of processing performed by a policy enforcement appliance in some embodiments. -
FIG. 5C illustrates a representation of processing performed by a policy enforcement appliance in some embodiments. -
FIG. 6 illustrates an embodiment of a process for enforcing a policy based at least in part on URL information. - The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
- A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
-
FIG. 1 illustrates an embodiment of an environment in which policies that include URL information are enforced. In the example shown,clients enterprise network 108. Policy enforcement appliance 102 (also referred to herein as “appliance 102”) is configured to enforce policies regarding communications between clients, such asclients network 108. Another example of a policy is a rule prohibiting access tosocial networking site 114 by clients between the hours of 9 am and 6 pm. Yet another example of a policy is a rule allowing access to streamingvideo website 116, subject to a bandwidth or other consumption constraint. Other types of policies can also be enforced, such as ones governing traffic shaping, quality of service, or routing with respect to a given URL, pattern of URLs, category of URL, or other URL information. In some embodiments,policy enforcement appliance 102 is also configured to enforce policies with respect to traffic that stays withinenterprise network 108. - The functionality provided by
policy enforcement appliance 102 can be implemented in a variety of ways. Specifically,policy enforcement appliance 102 can be a dedicated device or set of devices. The functionality provided byappliance 102 can also be integrated into or executed as software on a general purpose computer, a computer server, a gateway, and/or a network/routing device. Further, wheneverappliance 102 is described as performing a task, a single component, a subset of components, or all components ofappliance 102 may cooperate to perform the task. Similarly, whenever a component ofappliance 102 is described as performing a task, a subcomponent may perform the task and/or the component may perform the task in conjunction with other components. In various embodiments, portions ofappliance 102 are provided by one or more third parties. Depending on factors such as the amount of computing resources available toappliance 102, various logical components and/or features ofappliance 102 may be omitted and the techniques described herein adapted accordingly. Similarly, additional logical components/features can be added tosystem 102 as applicable. As one example, multiple bloom filters may be included. -
FIG. 2 illustrates an embodiment of a policy enforcement appliance. The example shown is a representation of physical components that are included inappliance 102, in some embodiments. Specifically,appliance 102 includes a high performancemulti-core CPU 202 andRAM 204.Appliance 102 also includes a storage 210 (such as one or more hard disks), which is used to store policy and other configuration information, as well as URL information.Data appliance 102 can also include one or more optional hardware accelerators. For example,data appliance 102 can include acryptographic engine 206 configured to perform encryption and decryption operations, and one ormore FPGAs 208 configured to perform matching, act as network processors, and/or perform other tasks. -
FIG. 3 illustrates an embodiment of a policy enforcement appliance. In the example shown, the functionality ofpolicy enforcement appliance 102 is implemented in a firewall. Specifically,appliance 102 includes amanagement plane 302 and adata plane 304. The management plane is responsible for managing user interactions, such as by providing a user interface for configuring policies (318) and viewing log data. The data plane is responsible for managing data, such as by performing packet processing (e.g., to extract URLs) and session handling. In various embodiments, a scheduler is responsible for managing the scheduling of requests (e.g., as presented bydata plane 304 tomanagement plane 302, or as presented bymanagement plane 302 to URL server 316). - One task performed by the firewall is URL filtering. Suppose
network 108 belongs to a company, “ACME Corporation.” Specified inappliance 102 are a set ofpolicies 318, some of which govern the types of websites that employees may access, and under what conditions. As one example, included in the firewall is a policy that permits employees to access news-related websites. Another policy included in the firewall prohibits, at all times, employees from accessing pornographic websites. Also included in the firewall is a database of URLs and associated categories. Other information can also be associated with the URLs in the database instead of or in addition to category information, and that other information can be used in conjunction with policy enforcement. - In some embodiments, the database is provided by a third party, such as through a subscription service. In such a scenario, it is possible that instead of the URLs being directly stored in
database 312, a transformation is applied to the URLs prior to storage. As one example, MD5 hashes of URLs can be stored indatabase 312, rather than the URLs themselves. The URLs stored in database 312 (or transformations thereof) represent the top n URLs for which access is most likely to be sought by users of client devices, such asclient 104, where n can be configured based on the computing and other resources available toappliance 102. As one example,database 312 includes 20 million URLs and is stored instorage 210. Abloom filter 308 is compiled from the contents ofdatabase 312 and is loaded intoRAM 204. In some embodiments, the bloom filter is compiled as a bitmask. Whenever changes are made to database 312 (e.g., as an update provided by a vendor),bloom filter 308 is recompiled. Also included in the firewall arevarious caches RAM 204. In some embodiments, all or some ofcaches appliance 102 and the processing described herein is adapted accordingly. Additional detail regarding components shown inFIG. 3 will be provided below. - When a user of client 104 (an employee referred to herein as “Alice”) attempts to engage in activities such as web surfing, communications from and to the client pass through
policy enforcement appliance 102. As one example, suppose Alice has launched a web browser application onclient 104 and would like to visit an arbitrary web page.Appliance 102 is configured to evaluate the URL of the site Alice would like to visit and determine whether access should be permitted. -
FIG. 4A illustrates an example of a URL (402) andFIGS. 4B-4D illustrate portions ofURL 402. In particular,FIG. 4B illustratesURL 402 up through the first subpath,FIG. 4C illustrates the hostname portion ofURL 402, andFIG. 4D illustrates the domain portion ofURL 402. Portions 404-408 are also referred to herein as “URLs 404-408.” In some embodiments, in the processing described in more detail below, a match against the most specific portion of URL 402 (e.g., URL 404) will be first attempted, with fallbacks to more generalized versions of the URL (e.g.,URLs - Suppose Alice would like to visit
URL 402—the California-specific front page of an online news service—and enters that URL into her browser. In some embodiments, the URL is evaluated byappliance 102 as follows. In the first stage of the evaluation, the data plane consultscache 306 for the presence of each ofURLs cache 306 is used to enforce anyapplicable policies 318. If none of the URLs are present incache 306, a temporary entry is inserted intocache 306 indicating that the URL is being resolved. As one example, a URL being resolved is assigned a temporary category of “UNRESOLVED.” In some embodiments, an entry for each of URLs 404-408 (and a corresponding status of “UNRESOLVED”) is included incache 306. In other embodiments, only one entry is made, such as an entry forURL 404. Additional requests received byappliance 102 for access to URL 402 (or portions thereof) will be queued pending the resolution. In various embodiments, a timeout condition is placed on UNRESOLVED entries included incache 306, such that if the entry is not updated within a specified period of time, the entry is removed. - Assuming the URL remains unresolved, the data plane sends a request to the management plane for evaluation of the URL. The next stage of evaluation is for the management plane to perform a match against
bloom filter 308.URL 404 is checked first, as follows:URL 404 is transformed as applicable (e.g., an MD5 hash ofURL 404 is computed). For the remainder of the discussion of this example, no distinction will be made between the URL and the MD5 (or other transformation) of the URL, to aid in clarity. It is to be assumed that ifdatabase 312 stores MD5 hashes, the queries performed against it (and the corresponding bloom filter and queries against the bloom filter) will be performed using MD5 (or other applicable) transformations of URLs. - A REJECT response, if received from
bloom filter 308 forURL 404, indicates with 100% confidence thatURL 404 is not present indatabase 312. An ACCEPT response indicates thatURL 404 is present indatabase 312, subject to a given false positive rate. The desired false positive rate ofbloom filter 308 is configurable and is in some embodiments set at 10%, meaning that an ACCEPT response indicates, with 90% confidence, that the URL is present indatabase 312. Additional detail of howelements FIGS. 5A-5C . -
FIGS. 5A-5C illustrate representations of processing performed by a policy enforcement appliance in some embodiments. In the examples shown, assumeURL 408 is present in database 312 (i.e., an MD5 hash ofURL 408 is present), whileURLs bloom filter 308 will indicate a false positive forURL 404. First, a match is performed using URL 404 (502).Bloom filter 308 reports an “accept,” (504) meaning that there is a 90% chance thatURL 404 is present indatabase 312.Cache 310 is evaluated for the presence of URL 404 (506).URL 404 is not present in the cache (508), and so a query ofdatabase 312 is performed using URL 404 (510). As mentioned above, the ACCEPTance ofURL 404 by the bloom filter was a false positive.URL 404 is not present indatabase 312. Accordingly, the query ofdatabase 312 forURL 404 will also fail (512). Next, a match againstbloom filter 308 forURL 406 is performed (532). The bloom filter reports a REJECT (534), indicating with 100% confidence that the URL is not present indatabase 312. There is accordingly no need to perform lookups againstcache 310 ordatabase 312 usingURL 406. Finally, a match againstbloom filter 308 forURL 408 is performed (572). The bloom filter reports an ACCEPT, (574) meaning that there is a 90% chance thatURL 408 is present indatabase 312.Cache 310 is evaluated for the presence of URL 408 (576).URL 408 is not present in the cache (578), and so a query ofdatabase 312 is performed using URL 408 (580). In this case,URL 508 is present indatabase 312 and so the corresponding category NEWS is returned (582) and ultimately provided todata plane 304, which will update the entry incache 306 by changing the UNRESOLVED category to NEWS. In some embodiments, only the finally matched URL (408) is updated incache 306. In other embodiments, entries for each ofURLs cache 306 with a NEWS category. The category will be used by the firewall to enforce any applicable rules. In this case, for example, Alice's attempt to accessURL 402 with her browser will be allowed, because her request has been associated with an attempt to access a NEWS site, which is a permissible use.Cache 310 is also updated to include the returned category and URL 408 (i.e., its MD5 hash). In some embodiments,cache 310 is also updated when result 512 is returned. In that case,URL 404 is included incache 310 along with a category of UNKNOWN. In various embodiments, whenresult 582 is returned, the UNKNOWN category included incache 310 forURL 404 is modified to match the result. - Returning to the description of
FIG. 3 , assume that none of URLs 404-408 are present indatabase 312. The next phase of evaluation performed by the management plane would be to consultcache 314 to see if any of the URLs are present therein. As with the previous phases, if one of the URLs is present, the corresponding category (e.g., “NEWS”) will be returned as a result and can be used by the firewall in policy enforcement (and included in cache 306). If the URLs are also absent fromcache 314, one or more remote URL servers, such asURL server 316, is queried. In some embodiments,URL server 316 is made available by the provider of the contents ofdatabase 312, and contains URL information that supplements the information included in database 312 (e.g., by including many millions of additional URLs and corresponding categories).URL server 316 can also be under the control of the owner ofappliance 102 or any other appropriate party. In various embodiments, a bloom filter corresponding to the data stored byURL server 316 is included inappliance 102. - In the event that URLs 404-408 are also absent from
URL server 316, a category of UNKNOWN will be returned and appropriate policies applied, based on the category, such as by blocking access toURL 402.Cache 306 can also be updated by switching the temporary category of UNRESOLVED to UNKNOWN. As withcache 310,cache 314 is updated based on results returned byURL server 316. In some embodiments, URLs with UNKNOWN categorization have a timeout, thus allowing for resolution of the categorization during a subsequent request. -
FIG. 6 illustrates an embodiment of a process for enforcing a policy based at least in part on URL information. In some embodiments, the process shown inFIG. 6 is performed bypolicy enforcement appliance 102 and, in various embodiments, multiple instances of the process shown inFIG. 6 or portions thereof are performed in parallel onappliance 102, as applicable. The process begins at 602 when a URL is received. As one example, at 602 a URL is received whendata plane 304 extracts a URL out of a packet received fromclient 104. At 604, the URL is matched against a bloom filter. In various embodiments, what is matched is a portion of the URL (e.g., portions 404-408 of URL 402), and/or a transformation of the URL (e.g., an MD5 hash of the URL or URL portion). As one example of the processing performed at 604,URL 404 is matched against bloom filter 308 (as illustrated inFIG. 5A at 502). As an alternate example of the processing performed at 604,URL 408 is matched against bloom filter 308 (as illustrated inFIG. 5C at 572). At 606, a first query is performed, based on a result of the match. As one example of the processing performed at 606,query 510 is performed. As an alternate example of the processing performed at 606,query 580 is performed. Finally, at 608, a policy is enforced based at least in part on a category received as a result of a second query. As one example of the processing performed at 608, a policy is enforced based on the receipt of the “NEWS” category. In some cases, the first and second query may be different (e.g., where the first query isquery 576 and the second query is 580; where the first query isquery 506 or query 510 and the second query isquery 580; or where the first query is performed againstdatabase 312 and the second query is performed againstcache 314 or remote URL server 316). In some cases, such as wherecache 310 is omitted, the first and second query may be the same (e.g., where the first and second queries are both query 580). - Although the foregoing embodiments have been described in some detail for purposes of clarity of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not restrictive.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/286,164 US20190266210A1 (en) | 2011-05-19 | 2019-02-26 | Using cache and bloom filters for url lookups |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/111,131 US10268656B1 (en) | 2011-05-19 | 2011-05-19 | Using cache and bloom filters for URL lookups |
US16/286,164 US20190266210A1 (en) | 2011-05-19 | 2019-02-26 | Using cache and bloom filters for url lookups |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/111,131 Continuation US10268656B1 (en) | 2011-05-19 | 2011-05-19 | Using cache and bloom filters for URL lookups |
Publications (1)
Publication Number | Publication Date |
---|---|
US20190266210A1 true US20190266210A1 (en) | 2019-08-29 |
Family
ID=66175043
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/111,131 Active 2033-04-12 US10268656B1 (en) | 2011-05-19 | 2011-05-19 | Using cache and bloom filters for URL lookups |
US16/286,164 Pending US20190266210A1 (en) | 2011-05-19 | 2019-02-26 | Using cache and bloom filters for url lookups |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/111,131 Active 2033-04-12 US10268656B1 (en) | 2011-05-19 | 2011-05-19 | Using cache and bloom filters for URL lookups |
Country Status (1)
Country | Link |
---|---|
US (2) | US10268656B1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10949486B2 (en) * | 2017-09-20 | 2021-03-16 | Citrix Systems, Inc. | Anchored match algorithm for matching with large sets of URL |
US10747667B2 (en) * | 2018-11-02 | 2020-08-18 | EMC IP Holding Company LLC | Memory management of multi-level metadata cache for content-based deduplicated storage |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6799251B1 (en) * | 2000-08-29 | 2004-09-28 | Oracle International Corporation | Performance-based caching |
US6920477B2 (en) * | 2001-04-06 | 2005-07-19 | President And Fellows Of Harvard College | Distributed, compressed Bloom filter Web cache server |
US7065536B2 (en) * | 2002-12-31 | 2006-06-20 | International Business Machines Corporation | Automated maintenance of an electronic database via a point system implementation |
US7565425B2 (en) * | 2003-07-02 | 2009-07-21 | Amazon Technologies, Inc. | Server architecture and methods for persistently storing and serving event data |
US7454418B1 (en) | 2003-11-07 | 2008-11-18 | Qiang Wang | Fast signature scan |
US7870161B2 (en) | 2003-11-07 | 2011-01-11 | Qiang Wang | Fast signature scan |
US8229930B2 (en) * | 2010-02-01 | 2012-07-24 | Microsoft Corporation | URL reputation system |
-
2011
- 2011-05-19 US US13/111,131 patent/US10268656B1/en active Active
-
2019
- 2019-02-26 US US16/286,164 patent/US20190266210A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
US10268656B1 (en) | 2019-04-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10965716B2 (en) | Hostname validation and policy evasion prevention | |
US10055561B2 (en) | Identity risk score generation and implementation | |
US11223622B2 (en) | Federated identity management for data repositories | |
US9762543B2 (en) | Using DNS communications to filter domain names | |
US10003574B1 (en) | Probabilistic duplicate detection | |
US8578487B2 (en) | System and method for internet security | |
US9635041B1 (en) | Distributed split browser content inspection and analysis | |
CN109643358B (en) | Cross-tenant data leakage isolation | |
EP3170091B1 (en) | Method and server of remote information query | |
US8533782B2 (en) | Access control | |
US11494482B1 (en) | Centralized applications credentials management | |
US20230108362A1 (en) | Key-value storage for url categorization | |
US20190266210A1 (en) | Using cache and bloom filters for url lookups | |
US20210365503A1 (en) | Focused url recrawl | |
US20220374540A1 (en) | Field level encryption searchable database system | |
US20230062658A1 (en) | Policy enforcement for data sources accessed via interfaces | |
US20230065765A1 (en) | Dynamic identity attribution | |
US7644286B1 (en) | System and method for restricting data access | |
CA3022356C (en) | Gateway policy enforcement and service metadata binding | |
US11522704B1 (en) | Encrypted data management system | |
KR20100050205A (en) | A service system for intercepting the contact with the harmful sites and a service method for intercepting the contact with the harmful sites |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STCV | Information on status: appeal procedure |
Free format text: NOTICE OF APPEAL FILED |
|
STCV | Information on status: appeal procedure |
Free format text: APPEAL BRIEF (OR SUPPLEMENTAL BRIEF) ENTERED AND FORWARDED TO EXAMINER |
|
STCV | Information on status: appeal procedure |
Free format text: EXAMINER'S ANSWER TO APPEAL BRIEF MAILED |
|
STCV | Information on status: appeal procedure |
Free format text: ON APPEAL -- AWAITING DECISION BY THE BOARD OF APPEALS |
|
STCV | Information on status: appeal procedure |
Free format text: BOARD OF APPEALS DECISION RENDERED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION |