CN110677270B - Domain name cacheability analysis method and system - Google Patents
Domain name cacheability analysis method and system Download PDFInfo
- Publication number
- CN110677270B CN110677270B CN201810720010.0A CN201810720010A CN110677270B CN 110677270 B CN110677270 B CN 110677270B CN 201810720010 A CN201810720010 A CN 201810720010A CN 110677270 B CN110677270 B CN 110677270B
- Authority
- CN
- China
- Prior art keywords
- domain name
- cacheable
- resource
- cache
- cacheability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L61/00—Network arrangements, protocols or services for addressing or naming
- H04L61/45—Network directories; Name-to-address mapping
- H04L61/4505—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
- H04L61/4511—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method and a system for analyzing the cacheability of a domain name, wherein the method comprises the following steps: collecting domain name data, and importing the domain name data into an original domain name database; extracting the domain name to be analyzed and the characteristic data which can be cached in the domain name data from the domain name data, and temporarily storing the domain name to be analyzed and the characteristic data; calculating a cache value of the domain name according to the cacheable feature data and the cacheable analysis algorithm of the domain name to be analyzed, and judging the cacheability of the domain name according to the comparison between the cache value and a preset value; and outputting the judgment result of the cacheability of the domain name. The method and the system for analyzing the cacheability of the domain name can judge whether the domain name is suitable for caching or not and give a caching suggestion.
Description
Technical Field
The invention relates to the technical field of the internet of communication technology, in particular to a method and a system for analyzing the cacheability of a domain name.
Background
A domain name is the name of a computer or group of computers on the internet, consisting of a string of names separated by dots, used to identify the electronic orientation of the computer during data transmission. The domain name is an important identifier of the internet access unit and the person on the network, plays a role in identification, and is convenient for other people to identify and retrieve information resources of a certain enterprise, organization or person, thereby better realizing resource sharing on the network.
The domain name is a set of address conversion system specially established for the convenience of memory, and the domain name resolution is a service for directing the domain name to a website space IP (Internet Protocol, internet interconnection Protocol) so that people can conveniently access the website through the registered domain name. To access a server on the internet, it must be finally realized by an IP address, and domain name resolution is a process of converting a domain name into an IP address again. The domain name cache is an optimization mechanism for storing data by a domain name system for quickly reading or avoiding repeated resource requests, and the effective cache can avoid repeated domain name resolution requests and quickly read IP addresses. The blind caching of domain names can occupy a large amount of memory, but at present, no method for analyzing whether domain names are suitable for caching exists.
Since there is no method for analyzing whether the domain name is suitable for caching, it cannot be determined whether the domain name is suitable for caching.
Disclosure of Invention
In view of the above, the present invention provides a method and a system for analyzing cacheability of a domain name to determine whether the domain name is suitable for caching and to provide a caching suggestion.
In order to solve the foregoing problem, in a first aspect, an embodiment of the present invention provides a method for analyzing cacheability of a domain name, where the method includes: collecting domain name data, and importing the domain name data into an original domain name database; extracting the domain name to be analyzed and the characteristic data which can be cached in the domain name from the original domain name data, and temporarily storing the domain name to be analyzed and the characteristic data; calculating a cache value of the domain name according to the cacheable feature data and the cacheable analysis algorithm of the domain name to be analyzed, and judging the cacheability of the domain name according to the comparison between the cache value and a preset value; and outputting the analysis and judgment result of the cacheability of the domain name to be analyzed.
With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where the cacheable feature data includes a URL (Uniform Resource Locator), an internet Resource type, a Resource attribute, and an expiration time of a document, where the internet Resource type is classified according to Resource service features, the Resource attribute includes a dynamic Resource and a static Resource, the dynamic Resource is a Resource used for data transformation and then invocation, and the static Resource is a Resource used for direct invocation.
With reference to the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where the calculating and determining step includes: and calculating two indexes according to the cacheable feature data of the domain name to be analyzed, wherein the resource attributes comprise the proportion of the number of static resources to the total number of the URL resources and the proportion of the expiration time of the document in the static resources to be greater than the cache time limit.
According to the two index values and the occupied weight thereof, calculating by weighted summation to obtain a cache value of the domain name; and when the cache value is larger than a preset cache threshold value, judging that the cacheability of the domain name is a suggested cache, and when the cache value is smaller than or equal to the preset cache threshold value, judging that the cacheability of the domain name is a non-suggested cache.
With reference to the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where a resource attribute of the cacheable feature data is a static resource, and further includes a resource file length and a resource file change period attribute; dividing the static resource into a cacheable object and a non-cacheable object; the cacheable object is a static resource with a resource file change period exceeding a set period, and the non-cacheable object is a static resource with a resource file change period less than or equal to the set period; the calculating and judging step comprises the following steps:
calculating to obtain five indexes according to the cacheable feature data of the domain name to be analyzed, wherein the five indexes are the proportion of static resources to the total number of URL resources, the proportion of cacheable objects in the static resources to the static resources, the proportion of non-cacheable objects in the static resources to the static resources, the proportion of large files in the cacheable objects in the static resources to the total number of cacheable objects in the static resources, and the proportion of expiration time of the files in the static resources greater than the cache time limit; calculating the cache value of the domain name through a weighted summation calculation formula according to the five indexes; and when the cache value is larger than a preset cache threshold value, judging that the cacheability of the domain name is a suggested cache, and when the cache value is smaller than or equal to the preset cache threshold value, judging that the cacheability of the domain name is a non-suggested cache.
With reference to the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where after the outputting step, the method further includes a step of establishing a cacheable domain name library: establishing a cacheable domain name library according to the cacheability judgment result of the domain name, wherein the cacheable domain name library is used for storing, inquiring, updating and analyzing the cacheable domain name and dynamically maintaining the current latest cacheable domain name; after establishing the cacheable domain name library, the acquisition step is as follows: after the domain name data is collected, comparing the domain name data with the domain name in the cacheable domain name library, and if the domain name information corresponding to the domain name to be analyzed is stored in the cacheable domain name library, directly entering the output step; and if the domain name information corresponding to the domain name to be analyzed does not exist in the cacheable domain name database, importing the acquired domain name data into the original domain name database.
The domain name library may include one or more of the domain name, and attribution of the domain name, the resource type, a cacheable proportion, whether HTTPS (high Text Transfer Protocol over Secure Socket Layer, network Protocol), a server port, a cache suggestion, and probe time.
With reference to the first aspect, an embodiment of the present invention provides a fifth possible implementation manner of the first aspect, where in the acquiring step, the network resource data is acquired in a crawler, a DNS log, DPI data, and a packet capturing manner.
With reference to the first aspect, an embodiment of the present invention provides a sixth possible implementation manner of the first aspect, where the total internet outgoing traffic is extracted from the original domain name base, the outgoing request information is statistically analyzed, the ranking condition of the total internet outgoing traffic is provided, and a decision basis is provided for a user whether to cache a website.
In a second aspect, an embodiment of the present invention provides a system for analyzing cacheability of a domain name, where the system includes: the acquisition module is used for acquiring domain name data and importing the domain name data into an original domain name database; the extraction module is used for extracting the domain name to be analyzed and the cacheable feature data from the domain name data and temporarily storing the cacheable feature data and the domain name; the calculation and judgment module is used for calculating the cache value of the domain name according to the cacheable feature data of the domain name to be analyzed and the cacheable analysis algorithm obtained by the extraction module, and judging the cacheability of the domain name according to the comparison between the cache value and a preset value; and the output module is used for outputting the analysis and judgment result of the cacheability of the domain name to be analyzed, which is obtained by the calculation and judgment module.
With reference to the second aspect, an embodiment of the present invention provides a first possible implementation manner of the second aspect, where the cacheable feature data includes a URL, an internet resource type, a resource attribute, and an expiration time of a resource document, where the internet resource type is classified according to resource service features, the resource attribute includes a dynamic resource and a static resource, the dynamic resource is a resource used for data transformation and then invocation, and the static resource is a resource used for direct invocation; the resource attribute of the cacheable feature data is static resource, and also comprises the length of the resource file and the attribute of the change period of the resource file; dividing the static resource into a cacheable object and a non-cacheable object; the cacheable object is a static resource with a resource file change period exceeding a set period, and the non-cacheable object is a static resource with a resource file change period less than or equal to the set period.
With reference to the second aspect, an embodiment of the present invention provides a second possible implementation manner of the second aspect, where the calculation and judgment module includes a calculation unit and a judgment unit, and the calculation unit includes a first calculation unit and a second calculation unit:
the first calculating unit is used for calculating two indexes according to the cacheable feature data of the domain name to be analyzed, wherein the two indexes comprise the proportion of the number of static resources in the total number of the resources and the proportion of the expiration time of the document in the static resources greater than the cache time limit; and calculating to obtain the cache value of the domain name through weighted summation according to the two index values and the occupied weight thereof.
And when the cache value is larger than a preset cache threshold value, judging that the cacheability of the domain name is a suggested cache, and when the cache value is smaller than or equal to the preset cache threshold value, judging that the cacheability of the domain name is a non-suggested cache.
The second calculating unit is used for calculating five indexes according to the cacheable feature data of the domain name to be analyzed, wherein the five indexes comprise the proportion of the resource attribute of static resources to the total number of the resources, the proportion of cacheable objects in the static resources to the static resources, the proportion of non-cacheable objects in the static resources to the static resources, the proportion of large files in the cacheable objects in the static resources to the total number of the cacheable objects in the static resources, and the proportion of the expiration time of the files in the static resources greater than the cache time limit; and obtaining the cache value of the domain name through weighted summation calculation according to the five index values and the occupied weight thereof.
And the judging unit is used for judging according to the cache value obtained by the first calculating unit or the second calculating unit, judging that the cacheability of the domain name is a suggested cache when the cache value is larger than a preset cache threshold, and judging that the cacheability of the domain name is a non-suggested cache when the cache value is smaller than or equal to the preset cache threshold.
With reference to the foregoing implementation manner of the second aspect, an embodiment of the present invention provides a third possible implementation manner of the second aspect, where the system further includes a library establishing module, configured to establish a cacheable domain name library according to a determination result of cacheability of the domain name output by the output module, where the library is used to store, query, update, and analyze the cacheable domain name, and dynamically maintain a current latest cacheable domain name; the acquisition module acquires domain name data and then compares the acquired domain name data with a domain name in a cacheable domain name library, and if domain name information corresponding to a domain name to be analyzed is stored in the cacheable domain name library, the acquired domain name information is directly sent to the output module; and if the domain name information corresponding to the domain name to be analyzed does not exist in the query cacheable domain name database, importing the acquired domain name data into the original domain name database, and calling an extraction module.
With reference to the second aspect, an embodiment of the present invention provides a fourth possible implementation manner of the second aspect, where the collecting module includes a crawler module, a DNS log analyzing module, a DPI data analyzing module, and a packet capturing module, and is configured to collect network resource data by means of crawler, DNS log analyzing, DPI data, and packet capturing, respectively.
In the embodiment of the invention, domain name data is firstly collected, the domain name data is imported into an original domain name database, then a domain name to be analyzed and cacheable feature data in the domain name are extracted from the original domain name data, then a cache value of the domain name is calculated according to the cacheable feature data and a cacheable analysis algorithm, the cacheability of the domain name is judged according to the cache value, and finally a judgment result of the cacheability of the domain name is output. Therefore, the method and the system for analyzing the cacheability of the domain name provided by the embodiment of the invention can judge whether the domain name is suitable for caching and give a caching suggestion; and furthermore, according to the cache suggestion, by adding the cacheable domain names and deleting the non-cacheable domain names, the cache of the domain names with low efficiency and low cache hit rate can be reduced, the waste of resources such as memory and the like is avoided, the cache of the domain names with high efficiency and high cache hit rate is increased, and the access efficiency is improved.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a flowchart illustrating a method for cacheability analysis of a domain name according to a first embodiment of the present invention;
fig. 2 is a detailed flowchart illustrating step S300 in a method for analyzing cacheability of a domain name according to a first embodiment of the present invention;
fig. 3 is a schematic block diagram illustrating a cacheability analysis system for domain names according to a second embodiment of the present invention;
fig. 4 is a schematic diagram illustrating a computing determination module in a domain name cacheability analysis system according to a second embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
In view of that there is no method for analyzing whether a domain name is suitable for caching, embodiments of the present invention provide a method and a system for analyzing cacheability of a domain name, which are described in detail below with reference to embodiments.
Example one
Fig. 1 is a schematic flowchart of a method for analyzing cacheability of a domain name according to a first embodiment of the present invention, where the method includes the following steps:
step S100, collecting: collecting domain name data, and importing the domain name data into an original domain name database.
In order to perform cacheability analysis of a domain name, domain name data are collected firstly, the collected domain name data are arranged into a required data structure record and are imported into an original domain name database, and cacheable feature data can be obtained simultaneously when the domain name data are collected. Each domain name and its data are recorded as a piece of data.
Example domain name data records are structured as: the domain name, the attribution of the domain name, the URL, the internet resource type, the resource attribute, whether HTTPS exists or not, the expiration time of the resource document, the server port, the detection time and the like, so that the data structure of the field content and the field sequence is determined. Some of these data fields contain cacheability features such as URLs, internet resource types, resource attributes, and document expiration times.
The step of collecting may collect the network resource data from the internet via a crawler, preferably a distributed web crawler. The distributed web crawler comprises a plurality of crawlers, the crawlers download web pages from the internet, such as news websites (new waves, search foxes and the like) with large access quantity, and the crawlers acquire domain name data related to the websites, and the domain name data are used for performing cache analysis of domain name resources subsequently; then, the resource data (domain name data) of the domain name is imported into the database, and at the same time, URLs are extracted from the web page information and crawl is continued along the direction of the URLs.
In order to adapt to different requirements, parameters of the crawler strategy can be customized. The crawling strength and the crawling mode more suitable for actual conditions can be selected according to actual requirements. For example, the crawling level can be adjusted according to the actual load condition of the probe node, and the crawling depth is increased or reduced; different crawling modes can be set according to actual needs, the crawling modes comprise direct crawling and crawling by a search engine, accuracy of direct crawling is high, coverage of crawling by the search engine is wide, and the situation that the website cannot be crawled due to the fact that safety protection of part of websites or the website does not have a direct entrance can be avoided.
Besides crawling and collecting network resource data through a crawler, the network resource data can be collected through other modes such as DNS logs, DPI data and packet capturing.
Step S200, an extraction step: extracting the domain name data to be analyzed and the characteristic data which can be cached from the original domain name data, and temporarily storing the domain name and the characteristic data which can be cached.
The domain name data acquired in step S100 contains a lot of information, and the method for analyzing cacheability of a domain name provided in the first embodiment of the present invention only needs to extract part of the information, so that a domain name to be analyzed and cacheable feature data need to be extracted from the acquired domain name data. The cacheable data is easy to call and can be used efficiently for a long time. The cacheable feature data includes, for example, a URL, an internet resource type, a resource attribute, and an expiration time of a document, where the internet resource type is classified according to resource service features, such as a video, a web page, and the resource attribute includes a dynamic resource and a static resource, the dynamic resource is a resource that is called after data transformation, for example, a resource that needs to be called after data is converted into HTML (HyperText Markup Language) from a database, and the static resource is a resource that can be called directly, for example, a URL without any parameter. The static resource attribute includes a cacheable object attribute, which refers to a resource corresponding to a state code of a cacheable HTTP (HyperText Transfer Protocol), where the state code includes 200, 203, 300, and 301. The static resource attributes also include non-cacheable object attributes, including: the fields of the http 1.0 containing the Set-Cookie, the http 1.1 containing the Set-Cookie and the Cache-Control are 'No-Cache' or 'private', the fields of the Pragma: no-Cache ',' Authorization 'and' Cache-Control 'are' No-Cache, no-store and private ', and the information of the Last-Modified' is not included. The static resource attributes also comprise a file length attribute and a file change period attribute.
In order to facilitate the call in the subsequent analysis calculation, the cacheable feature data needs to be temporarily stored by using a reasonable data structure and an index is established.
Step S300, calculating and judging step: according to the cacheable feature data of the domain name to be analyzed obtained in the step S200, in combination with a cacheable analysis algorithm, calculating a cache value of the domain name, and determining the cacheability of the domain name according to the cache value.
The cacheable analysis algorithm is to calculate the cache value of the domain name by weighted summation according to the aspect that whether the cacheable feature data support the aspects of easy calling, relative stability and longer-term effective use, compare the cache value with a preset value and judge the cacheability of the domain name.
A simple algorithm is to calculate two indexes according to the cacheable feature data of the domain name to be analyzed, wherein the resource attributes comprise the proportion of the number of static resources to the total number of URL resources and the proportion of the expiration time of documents in the static resources to be greater than the cache time limit.
According to the two index values and the occupied weight thereof, calculating by weighted summation to obtain a cache value of the domain name; and when the cache value is larger than a preset cache threshold value, judging that the cacheability of the domain name is a suggested cache, and when the cache value is smaller than or equal to the preset cache threshold value, judging that the cacheability of the domain name is a non-suggested cache.
The method is simple in calculation and only needs to pay attention to the static resource occupation ratio and the occupation ratio of the expired use time limit.
There is also a more accurate and refined algorithm. Fig. 2 is a detailed flowchart of step S300 in the method for analyzing cacheability of a domain name according to the first embodiment of the present invention, where the step further includes the following three sub-steps:
step S301, calculating to obtain five indexes according to the cacheable feature data, wherein the five indexes include resource attributes which are the proportion of the number of static resources to the total number of URL resources, the proportion of cacheable objects in the static resources to the static resources, the proportion of non-cacheable objects in the static resources to the static resources, the proportion of large files in the cacheable objects in the static resources to the total number of cacheable objects in the static resources, and the proportion of expiration time of the files in the static resources to be greater than the cache time limit;
the cacheable object is a static resource with a change period exceeding a set period, and the non-cacheable object is a static resource with a change period less than or equal to the set period, where the set period may be set according to an empirical value and an actual situation, and is not limited here. Generally, JS (JavaScript), CSS (Cascading Style Sheets), and pictures are persistent as cacheable objects, and each request requires that the latest data is requested as a non-cacheable object.
In the first embodiment of the present invention, for determining the cacheability of the domain name, five indexes can be calculated and obtained through the cacheable feature data that is crawled by crawlers, and the five indexes are comprehensively evaluated to determine the cacheability of the domain name. The five indexes are respectively as follows:
(1) The proportion of the static resource to the total number of the URLs refers to the proportion of the static resource to the total number of the URLs under the domain name thereof, and is marked as t1.
(2) The proportion of the cacheable objects in the static resources to the static resources is denoted as t2.
(3) The proportion of the non-cacheable object in the static resource to the static resource is denoted as t3.
(4) The proportion of the large file in the cacheable objects in the static resource to the total number of the cacheable objects in the static resource is marked as t4, wherein the large file is larger than fmax, and the fmax can be defined by the size according to actual requirements.
(5) The expiration time of the document in the static resource is larger than the proportion of the cache time limit, namely the proportion that max-age is larger than tmax is marked as t5, wherein tmax can be defined by user according to actual requirements.
Step S302, according to the five indexes, calculating the cache value of the domain name through a weighted sum calculation formula.
In order to comprehensively evaluate the five indexes, a weight calculation formula can be designed, wherein S = w1 t1+ w2 t2+ w3 t3+ w4 t4+ w5 t5, S is a cache value, and w1, w2, w3, w4 and w5 are weights of t1, t2, t3, t4 and t5 respectively. And substituting each index into the weighting calculation formula to calculate to obtain a cache value S.
Step S303, according to the comparison between the cache value and the preset value, the cacheability of the domain name is judged.
Specifically, when the cache value is greater than a preset cache threshold, the cacheability of the domain name is judged as a suggested cache, and when the cache value is less than or equal to the preset cache threshold, the cacheability of the domain name is judged as an unrendered cache. If the preset cache threshold value is marked as a, judging that the cacheability of the domain name is a suggested cache when S is larger than a, and judging that the cacheability of the domain name is a non-suggested cache when S is smaller than or equal to a. The preset cache threshold value can be comprehensively set according to the empirical value and the actual occupation condition of the hard disk space.
Table 1 shows the five indexes and their limit values, which can be customized according to the actual demand, for example, according to the actual occupation of the hard disk space and the cache hit condition, if the ratio t1 of the customized static resource to the total number of URLs is less than or equal to 0.5, then the website with the ratio t1 of the static resource to the total number of URLs being greater than 0.5 is judged as not suggested for caching.
Table 2 shows the weights of the five indexes, fmax, tmax, and the size of the preset cache threshold a, and the size of the parameters may also be customized according to actual requirements, for example, the preset cache threshold a is determined according to a memory condition, a domain name condition, and a cache target to be reached, where table 2 is only an example.
TABLE 1
Name (R) | t1 | t2 | t3 | t4 | t5 |
Limit value | ≤0.5 | ≥0.8 | ≤0.2 | ≤0.2 | ≥0.3 |
TABLE 2
Name (R) | w1 | w2 | w3 | w4 | w5 | fmax | tmax | a |
Size and breadth | 0.05 | 0.45 | 0.05 | 0.05 | 0.4 | 10mb | 10min | 0.6 |
As shown in table 2, if the sizes of the indexes of domain name 1 are 0.5, 0.8, 0.2, and 0.5 in sequence, the weighted calculation formula is substituted with S = w1 × t1+ w2 × t2+ w3 × t3+ w4 × t4+ w5 × t5, and S1=0.605 is obtained by calculation, while the predetermined cacheable threshold is 0.6, and since S1 > a =0.6, domain name 1 is determined as the suggested cache.
If the sizes of all indexes of the domain name 2 are 0.5, 0.8, 0.2, 0.1 and 0.4 in sequence, then the weighted calculation formula is substituted, S = w1 t1+ w2 t2+ w3 t3+ w4 t4+ w5 t5, S2=0.56 can be obtained through calculation, the preset cacheable threshold value is 0.6, and the domain name 2 is judged as not recommended for caching because S2 < a = 0.6.
Step S400, outputting the result of determining the cacheability of the domain name to be analyzed.
The method comprehensively judges and obtains the judgment result of the cacheability of the domain name.
Preferably, the judgment result can be made into a list for output, so that the user can conveniently check the judgment result. The output judgment result list may include, for example, the following: the domain name, and the attribution of the domain name, the internet resource type, the resource attribute, the cacheable proportion, whether to HTTPS, the server port, the probe time, and the cache suggestion.
Further, a cacheable domain name library may be established according to the above caching proposal, and the cacheable domain name library may include a domain name, and attribution of the domain name, a resource type, a cacheable proportion, whether HTTPS is used, a server port, a caching proposal, a probe time, and the like. The cacheable domain name repository is used to store, query, update and analyze domain names. And adjusting the current cacheable domain name database, such as adding cacheable domain names, deleting non-cacheable domain names and dynamically maintaining the current latest cacheable domain names. The cache of the domain name with low efficiency and low cache hit rate can be reduced, the waste of resources such as memory is avoided, the cache of the domain name with high efficiency and high cache hit rate is increased, and the access efficiency is improved.
It should be noted that the acquiring step may acquire the network resource data in various manners, such as a crawler, a DNS log, DPI data, and a packet capture.
In the embodiment of the invention, domain name data is firstly collected, the domain name data is imported into an original domain name database, then a domain name to be analyzed and cacheable feature data are extracted from the domain name database, then a cache value of the domain name is calculated according to the cacheable feature data and a cacheable analysis algorithm, the cacheability of the domain name is judged according to the cache value, and finally a judgment result of the cacheability of the domain name is output. Therefore, the method for analyzing the cacheability of the domain name provided by the embodiment of the invention is used for giving a cache suggestion by judging whether the domain name is suitable for caching; and then according to the cache suggestion, the existing cacheable domain names are added, the non-cacheable domain names are deleted, the cache of the domain names with low efficiency and low cache hit rate can be reduced, the waste of resources such as internal memory is avoided, the cache of the domain names with high efficiency and high cache hit rate is increased, and the domain name access efficiency is improved.
The method for analyzing cacheability of a domain name provided in the first embodiment of the present invention further provides a domain name comparison function of a cacheable domain name library, which is reflected in the acquisition step. The acquisition steps are as follows: after the domain name data is collected, comparing the collected domain name data with the domain name in the cacheable domain name library, and if the domain name information corresponding to the domain name is stored in the cacheable domain name library, directly using the cacheability suggestion stored in the cacheable domain name library to directly enter an output step; that is, the related information and cache suggestion of the domain name can be obtained quickly through domain name comparison. And if the domain name information corresponding to the domain name to be analyzed does not exist in the cacheable domain name database, importing the acquired domain name data into the original domain name database, and entering an extraction step. Therefore, repeated domain name resolution requests can be avoided, the IP address can be read quickly, repeated analysis and judgment operations are reduced, and the efficiency of domain name cacheability judgment is improved.
In order to further analyze the cacheability of the domain name, the method for analyzing the cacheability of the domain name according to the first embodiment of the present invention further includes extracting the whole-network outgoing traffic from the original domain name database, analyzing and counting outgoing request information, providing a ranking condition of the whole-network outgoing traffic, and providing a further basis for a user to decide whether to cache the website. Preferably, the whole network outgoing flow can be subjected to mirror image analysis, the detailed information of all the outgoing requests is counted, and the relevant characteristics of the outgoing requests are analyzed, so that decision basis information is provided, wherein the decision basis information comprises resource types, content-Length attributes, cookies (small text files), return codes and caching suggestions. For example, the flow direction of the traffic is analyzed, and if the cross-province or cross-network traffic is large (generally, more than 30% of the total traffic is considered as large traffic, and this threshold may be adjusted according to actual conditions), it is suggested to cache the domain name corresponding to the traffic.
When the cacheability of the domain name cannot be judged due to a problem, the information needs to be timely returned to the maintainer and processed by the maintainer. The specific mode can be as follows: for the domain name which can not judge the cacheability, returning the collected information such as URL, resource type, max-age and the like to the maintenance personnel; maintenance personnel need to find problems and quickly judge the problems at the first time, and update the cache algorithm in time, so that the accuracy of the cache engine is continuously improved.
Based on the above analysis, the method for analyzing cacheability of a domain name provided by the embodiment of the present invention has the following beneficial effects: (1) Whether the domain name is suitable for caching can be judged, and caching suggestions are given; (2) The judgment result of the cacheability of the domain name can be output in a list form, so that the user can conveniently check the domain name; (3) The subsequent adjustment of the domain name which can be cached is facilitated, a dynamic domain name database which can be cached is formed, the domain name which can be cached is increased, and the domain name which can not be cached is deleted, so that the cache of the domain name with low efficiency and low cache hit rate is reduced, the waste of resources such as internal memory is avoided, the cache of the domain name with high efficiency and high cache hit rate is increased, and the domain name access efficiency is improved; (4) The domain name comparison function of the cacheable domain name library can directly use the cacheability judgment result stored in the domain name library through comparison, so that the repeated analysis and judgment operation is reduced, and the efficiency of domain name cacheability judgment is improved.
Example two
Corresponding to the method in the first embodiment, an embodiment of the present invention further provides a system for analyzing cacheability of a domain name, configured to execute the method in the first embodiment. Fig. 3 is a schematic diagram illustrating a module composition of a system for analyzing cacheability of a domain name according to a second embodiment of the present invention, where as shown in fig. 3, the system for analyzing cacheability of a domain name in this embodiment includes: the acquisition module 100 is configured to acquire domain name data and import the domain name data into an original domain name database; an extracting module 200, configured to extract a domain name to be analyzed and cacheable feature data therein from original domain name data, and temporarily store the cacheable feature data and the domain name; the calculation and judgment module 300 is configured to calculate a cache value of the domain name according to the cacheable feature data of the domain name to be analyzed and the cacheable analysis algorithm obtained by the extraction module, and judge the cacheability of the domain name according to comparison between the cache value and a preset value; an output module 400, configured to output the analysis and determination result of the cacheability of the domain name to be analyzed, which is obtained by the calculation and determination module.
Preferably, the collecting module 100 may include a crawler module, a DNS log analyzing module, a DPI data analyzing module, and a packet capturing module, and is configured to collect the network resource data by means of crawler, DNS log analyzing, DPI data, and packet capturing, respectively.
Preferably, the cacheable feature data of the domain name to be analyzed, which is obtained by the extraction module 200, includes a URL, an internet resource type, a resource attribute, and an expiration time of a resource document, where the internet resource type is classified according to resource service features, the resource attribute includes a dynamic resource and a static resource, the dynamic resource is a resource used for data transformation and then calling, and the static resource is a resource used for direct calling; the resource attribute of the cacheable feature data is static resource, and further comprises the length of a resource file and the change period attribute of the resource file; dividing the static resource into a cacheable object and a non-cacheable object; the cacheable object is a static resource with a resource file change period exceeding a set period, and the non-cacheable object is a static resource with a resource file change period less than or equal to the set period.
Fig. 4 is a schematic diagram illustrating a composition of a calculation and judgment module in a system for analyzing cacheability of a domain name according to a second embodiment of the present invention, and as shown in fig. 4, the calculation and judgment module 300 includes a calculation unit and a judgment unit, and the calculation unit includes a first calculation unit and a second calculation unit.
A first calculating unit 301, configured to calculate two indexes according to the cacheable feature data of the domain name to be analyzed obtained by the extracting module, where the two indexes include a ratio of a resource attribute of the number of static resources to the total number of resources, and a ratio of expiration time of a document in the static resources to be greater than a cache time limit; according to the two index values and the occupied weight thereof, calculating by weighted summation to obtain a cache value of the domain name;
and when the cache value is larger than a preset cache threshold value, judging that the cacheability of the domain name is a suggested cache, and when the cache value is smaller than or equal to the preset cache threshold value, judging that the cacheability of the domain name is a non-suggested cache.
A second calculating unit 302, configured to calculate five indexes according to the cacheable feature data of the domain name to be analyzed, where the five indexes include a ratio of a resource attribute of static resources to a total number of resources, a ratio of cacheable objects in the static resources to the static resources, a ratio of non-cacheable objects in the static resources to the static resources, a ratio of large files in the cacheable objects in the static resources to the total number of cacheable objects in the static resources, and a ratio of expiration time of a file in the static resources greater than a cache time limit; then, according to the five indexes, the cache value of the domain name is calculated through a weighted calculation formula.
A determining unit 303, configured to perform a determination according to the cache value obtained by the calculation of the first calculating unit 301 or the second calculating unit 302, determine that the cacheability of the domain name to be analyzed is a suggested cache when the cache value is greater than a preset cache threshold, and determine that the cacheability of the domain name to be analyzed is a non-suggested cache when the cache value is less than or equal to the preset cache threshold.
In the embodiment of the present invention, the acquisition module 100 acquires domain name data, and imports the domain name data into an original domain name database, then the extraction module 200 extracts cacheable feature data and a domain name to be analyzed from the original domain name database, the judgment module 300 calculates a cache value of the domain name according to the cacheable feature data and a cacheable analysis algorithm, and judges the cacheability of the domain name according to comparison between the cache value and a preset value, and finally the output module 400 outputs a judgment result of the cacheability of the domain name. Therefore, the system for analyzing the cacheability of the domain name provided by the embodiment of the invention can judge whether the domain name is suitable for caching and give a caching suggestion; and then according to the cache suggestion, by increasing the cached domain names and deleting the non-cached domain names, the cache of the domain names with low efficiency and low cache hit rate can be reduced, the waste of resources such as internal memory and the like is avoided, the cache of the domain names with high efficiency and high cache hit rate is increased, and the access efficiency is improved.
In order to reduce repetitive analysis and determination operations and improve work efficiency, the system for analyzing cacheability of a domain name provided in the second embodiment of the present invention further includes a library establishing module, configured to establish a cacheable domain name library according to a determination result of cacheability of a domain name output by the output module, and configured to store, query, update, and analyze the cacheable domain name, dynamically maintain a current latest cacheable domain name, add a cacheable domain name, and delete a non-cacheable domain name. The domain name repository may include domain names, and one or more of attribution of domain names, resource types, cacheable proportions, whether HTTPS is available, server ports, caching suggestions, probe times. The cacheable domain name library is used for saving, querying, updating, analyzing domain names and the like. The acquisition module acquires domain name data and then compares the acquired domain name data with a domain name in a cacheable domain name library, and if domain name information corresponding to a domain name to be analyzed is stored in the cacheable domain name library, the acquired domain name information is directly sent to the output module; if the domain name information corresponding to the domain name to be analyzed does not exist in the query cacheable domain name database, the acquired domain name data is imported into the original domain name database, and an extraction module is called to perform domain name cacheability analysis.
Based on the above analysis, the cacheability analysis system for domain names provided by the embodiment of the present invention has the following beneficial effects: (1) Whether the domain name is suitable for caching or not can be judged, and a caching suggestion is given; (2) The judgment result of the caching performance of the domain name can be output in a list form, and the user can check the domain name conveniently; (3) The subsequent adjustment of the domain name is facilitated, the domain name which can be cached is increased, and the domain name which cannot be cached is deleted, so that the cache of the domain name with low efficiency and low cache hit rate is reduced, the waste of resources such as internal memory is avoided, the cache of the domain name with high efficiency and high cache hit rate is increased, and the access efficiency is improved; (4) The domain name comparison function of the domain name library can directly use the cacheability judgment result stored in the domain name library through comparison, so that repeated analysis and judgment operations are reduced, and the efficiency of domain name cacheability judgment is improved.
The system for analyzing cacheability of a domain name provided by the embodiment of the present invention may be specific hardware on a device, or software or firmware installed on a device, or the like. The system provided by the embodiment of the present invention has the same implementation principle and the same technical effect as the method embodiment, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiment for the case where no part of the system embodiment is mentioned. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the foregoing systems and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided by the present invention, it should be understood that the disclosed method and system may be implemented in other ways. The above-described system embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and there may be other divisions in actual implementation, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of systems or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided by the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that the following descriptions are only illustrative and not restrictive, and that the scope of the present invention is not limited to the above embodiments: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention. Are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (9)
1. A method for analyzing cacheability of a domain name, the method comprising:
the collection step comprises: collecting domain name data, and importing the domain name data into an original domain name database;
the extraction step comprises: extracting the domain name to be analyzed and the characteristic data which can be cached in the domain name from the original domain name data, and temporarily storing the domain name to be analyzed and the characteristic data;
the cacheable feature data comprises a URL (Uniform resource locator), an Internet resource type, a resource attribute and expiration time of a resource document, wherein the Internet resource type is classified according to resource service features, the resource attribute comprises a dynamic resource and a static resource, the dynamic resource is a resource used for data conversion and then calling, and the static resource is a resource used for direct calling;
and (3) calculating and judging: calculating a cache value of the domain name according to the cacheable feature data and the cacheable analysis algorithm of the domain name to be analyzed, and judging the cacheability of the domain name according to the comparison between the cache value and a preset value;
the cacheable analysis algorithm specifically comprises: calculating two indexes according to the cacheable feature data of the domain name to be analyzed, wherein the resource attributes comprise the proportion of the number of static resources to the total number of URL resources and the proportion of the expiration time of a document in the static resources to be greater than the cache time limit; according to the two index values and the occupied weight thereof, calculating by weighted summation to obtain a cache value of the domain name;
when the cache value is larger than a preset cache threshold value, judging that the cacheability of the domain name is a suggested cache, and when the cache value is smaller than or equal to the preset cache threshold value, judging that the cacheability of the domain name is a non-suggested cache;
an output step: and outputting the analysis and judgment result of the cacheability of the domain name to be analyzed.
2. The method according to claim 1, wherein the resource attribute of the cacheable feature data is static resource, and further comprises a resource file length and a resource file change period attribute; dividing the static resource into a cacheable object and a non-cacheable object; the cacheable object is a static resource with a resource file change period exceeding a set period, and the non-cacheable object is a static resource with a resource file change period less than or equal to the set period;
the calculating and judging step comprises the following steps:
calculating to obtain five indexes according to the cacheable characteristic data of the domain name to be analyzed, wherein the resource attributes comprise the proportion of the number of static resources to the total number of URL resources, the proportion of cacheable objects in the static resources to the static resources, the proportion of non-cacheable objects in the static resources to the static resources, the proportion of large files in the cacheable objects in the static resources to the total number of cacheable objects in the static resources, and the proportion of the expiration time of the files in the static resources greater than the caching time limit;
according to the five index values and the weights occupied by the five index values, calculating by weighted summation to obtain a cache value of the domain name;
and when the cache value is larger than a preset cache threshold value, judging that the cacheability of the domain name is a suggested cache, and when the cache value is smaller than or equal to the preset cache threshold value, judging that the cacheability of the domain name is a non-suggested cache.
3. The method according to claim 1 or 2, wherein after the step of outputting, further comprising the step of establishing a cacheable domain name repository: establishing a cacheable domain name library according to the cacheability judgment result of the domain name, wherein the cacheable domain name library is used for storing, inquiring, updating and analyzing the cacheable domain name and dynamically maintaining the current latest cacheable domain name; after establishing the cacheable domain name library, the acquisition step is as follows: after the domain name data is collected, comparing the domain name data with the domain name in the cacheable domain name library, and if the domain name information corresponding to the domain name to be analyzed is stored in the cacheable domain name library, directly entering the output step; and if the domain name information corresponding to the domain name to be analyzed does not exist in the cacheable domain name database, importing the acquired domain name data into the original domain name database.
4. The method of claim 1, wherein the collecting step collects the network resource data by crawler, DNS log, DPI data, packet capture.
5. The method of claim 1, further comprising: extracting the whole network outgoing flow from the original domain name library, counting and analyzing the outgoing request information, providing the ranking condition of the whole network outgoing flow, and providing a decision basis for a user whether to cache a website.
6. A system for cacheability analysis of a domain name, the system comprising:
the acquisition module is used for acquiring domain name data and importing the domain name data into an original domain name database;
the extraction module is used for extracting the domain name to be analyzed and the cacheable feature data from the original domain name data and temporarily storing the cacheable feature data and the domain name;
the cacheable feature data comprises a URL (Uniform resource locator), an Internet resource type, a resource attribute and expiration time of a resource document, wherein the Internet resource type is classified according to resource service features, the resource attribute comprises a dynamic resource and a static resource, the dynamic resource is a resource used for data conversion and then calling, and the static resource is a resource used for direct calling;
the calculation and judgment module is used for calculating the cache value of the domain name according to the cacheable feature data of the domain name to be analyzed and the cacheable analysis algorithm obtained by the extraction module, and judging the cacheability of the domain name according to the comparison between the cache value and a preset value;
the cacheable analysis algorithm specifically comprises: calculating two indexes according to the cacheable feature data of the domain name to be analyzed, wherein the resource attributes comprise the proportion of the number of static resources to the total number of URL resources and the proportion of the expiration time of the document in the static resources greater than the cache time limit; according to the two index values and the occupied weight thereof, calculating by weighted summation to obtain a cache value of the domain name;
when the cache value is larger than a preset cache threshold, judging that the cacheability of the domain name is a suggested cache, and when the cache value is smaller than or equal to the preset cache threshold, judging that the cacheability of the domain name is a non-suggested cache;
and the output module is used for outputting the analysis and judgment result of the cacheability of the domain name to be analyzed, which is obtained by the calculation and judgment module.
7. The system according to claim 6, wherein the resource attribute of the cacheable feature data is static resource, further comprising resource file length and resource file change period attribute; dividing the static resource into a cacheable object and a non-cacheable object; the cacheable object is a static resource with a resource file change period exceeding a set period, and the non-cacheable object is a static resource with a resource file change period less than or equal to the set period.
8. The system according to claim 6, wherein the calculation judgment module comprises a calculation unit and a judgment unit, the calculation unit comprises a first calculation unit and a second calculation unit:
the first calculation unit is used for calculating two indexes according to the cacheable feature data of the domain name to be analyzed, wherein the two indexes comprise the resource attribute which is the proportion of the number of static resources to the total number of the resources and the proportion of the expiration time of a document in the static resources which is greater than the cache time limit; according to the two index values and the occupied weight thereof, calculating by weighted summation to obtain a cache value of the domain name;
the second calculating unit is used for calculating five indexes according to the cacheable feature data of the domain name to be analyzed, wherein the five indexes comprise the proportion of the resource attribute of the number of static resources to the total number of the resources, the proportion of cacheable objects in the static resources to the static resources, the proportion of non-cacheable objects in the static resources to the static resources, the proportion of large files in the cacheable objects in the static resources to the total number of the cacheable objects in the static resources, and the proportion of the expiration time of the files in the static resources greater than the cache time limit; according to the five index values and the weight occupied by the five index values, obtaining the cache value of the domain name through weighted summation calculation;
and the judging unit is used for judging according to the cache value obtained by the first calculating unit or the second calculating unit, judging that the cacheability of the domain name is a suggested cache when the cache value is larger than a preset cache threshold, and judging that the cacheability of the domain name is a non-suggested cache when the cache value is smaller than or equal to the preset cache threshold.
9. The system according to claim 6 or 8, wherein the system further comprises a database building module, configured to build a cacheable domain name database according to the judgment result of cacheability of the domain name output by the output module, and configured to store, query, update, and analyze the cacheable domain name, and dynamically maintain the current latest cacheable domain name; the acquisition module compares the acquired domain name data with the domain names in the cacheable domain name library, and directly sends the queried domain name information to the output module if the domain name information corresponding to the domain name to be analyzed is stored in the cacheable domain name library; if the domain name information corresponding to the domain name to be analyzed does not exist in the query cacheable domain name database, the acquired domain name data is imported into the original domain name database, and the extraction module is called.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810720010.0A CN110677270B (en) | 2018-07-03 | 2018-07-03 | Domain name cacheability analysis method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810720010.0A CN110677270B (en) | 2018-07-03 | 2018-07-03 | Domain name cacheability analysis method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110677270A CN110677270A (en) | 2020-01-10 |
CN110677270B true CN110677270B (en) | 2023-02-28 |
Family
ID=69065877
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810720010.0A Active CN110677270B (en) | 2018-07-03 | 2018-07-03 | Domain name cacheability analysis method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110677270B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111741065B (en) * | 2020-05-18 | 2022-03-08 | 北京直真科技股份有限公司 | Batch CDN resource cache automation device |
CN114629919A (en) * | 2022-03-31 | 2022-06-14 | 北京百度网讯科技有限公司 | Resource acquisition method, device, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106921713A (en) * | 2015-12-25 | 2017-07-04 | 中国移动通信集团上海有限公司 | A kind of resource caching method and device |
US9723053B1 (en) * | 2013-08-30 | 2017-08-01 | Amazon Technologies, Inc. | Pre-fetching a cacheable network resource based on a time-to-live value |
CN107153663A (en) * | 2016-03-04 | 2017-09-12 | 中国移动通信集团北京有限公司 | A kind of domain name resources caching method and device |
CN107819837A (en) * | 2017-10-31 | 2018-03-20 | 南京优速网络科技有限公司 | A kind of method and log cache analysis system for lifting buffer service quality |
-
2018
- 2018-07-03 CN CN201810720010.0A patent/CN110677270B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9723053B1 (en) * | 2013-08-30 | 2017-08-01 | Amazon Technologies, Inc. | Pre-fetching a cacheable network resource based on a time-to-live value |
CN106921713A (en) * | 2015-12-25 | 2017-07-04 | 中国移动通信集团上海有限公司 | A kind of resource caching method and device |
CN107153663A (en) * | 2016-03-04 | 2017-09-12 | 中国移动通信集团北京有限公司 | A kind of domain name resources caching method and device |
CN107819837A (en) * | 2017-10-31 | 2018-03-20 | 南京优速网络科技有限公司 | A kind of method and log cache analysis system for lifting buffer service quality |
Non-Patent Citations (1)
Title |
---|
Cache自主运营的自动化手段建设;李凯等;《电信技术》;20150930;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110677270A (en) | 2020-01-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11809504B2 (en) | Auto-refinement of search results based on monitored search activities of users | |
US7093012B2 (en) | System and method for enhancing crawling by extracting requests for webpages in an information flow | |
US10089579B1 (en) | Predicting user navigation events | |
AU2001290363A1 (en) | A method for searching and analysing information in data networks | |
US20130144834A1 (en) | Uniform resource locator canonicalization | |
CN110430188B (en) | Rapid URL filtering method and device | |
US8438336B2 (en) | System and method for managing large filesystem-based caches | |
KR100509276B1 (en) | Method for searching web page on popularity of visiting web pages and apparatus thereof | |
CN102752288A (en) | Method and device for identifying network access action | |
CN111831699B (en) | Data caching method, electronic equipment and computer readable medium | |
EP2802979A2 (en) | Processing store visiting data | |
CN111368227B (en) | URL processing method and device | |
CN106649313B (en) | Method and apparatus for processing cache data | |
CN110677270B (en) | Domain name cacheability analysis method and system | |
CN104202418B (en) | Recommend the method and system of the content distributing network of business for content supplier | |
CN112749360A (en) | Webpage classification method and device | |
Langhnoja et al. | Web usage mining to discover visitor group with common behavior using DBSCAN clustering algorithm | |
CN104468857B (en) | A kind of acquisition methods and system of correspondence | |
JP3664906B2 (en) | Information source observation apparatus, information source observation method, and recording medium storing a program for executing information source observation processing | |
JP3666638B2 (en) | Information source observation apparatus, information source observation method, and computer-readable recording medium recording information source observation program | |
JP4286828B2 (en) | Web page patrol device and web page patrol program | |
CN110955855A (en) | Information interception method, device and terminal | |
KR102093166B1 (en) | A method for reducing connection time to website and an apparatus for the method | |
JP5165717B2 (en) | Dead link determination apparatus and method | |
CN104392000A (en) | Method and device for determining catching quota of mobile station |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |