CN110677270A - Domain name cacheability analysis method and system - Google Patents

Domain name cacheability analysis method and system Download PDF

Info

Publication number
CN110677270A
CN110677270A CN201810720010.0A CN201810720010A CN110677270A CN 110677270 A CN110677270 A CN 110677270A CN 201810720010 A CN201810720010 A CN 201810720010A CN 110677270 A CN110677270 A CN 110677270A
Authority
CN
China
Prior art keywords
domain name
cacheable
resource
cache
cacheability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810720010.0A
Other languages
Chinese (zh)
Other versions
CN110677270B (en
Inventor
章建功
李萍
丁健
齐超
姜帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changchun Yiyang Computer Development Co Ltd
Original Assignee
Changchun Yiyang Computer Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changchun Yiyang Computer Development Co Ltd filed Critical Changchun Yiyang Computer Development Co Ltd
Priority to CN201810720010.0A priority Critical patent/CN110677270B/en
Publication of CN110677270A publication Critical patent/CN110677270A/en
Application granted granted Critical
Publication of CN110677270B publication Critical patent/CN110677270B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a system for analyzing the cacheability of a domain name, wherein the method comprises the following steps: collecting domain name data, and importing the domain name data into an original domain name database; extracting the domain name to be analyzed and the characteristic data which can be cached in the domain name data from the domain name data, and temporarily storing the domain name to be analyzed and the characteristic data; calculating a cache value of the domain name according to the cacheable feature data of the domain name to be analyzed and a cacheable analysis algorithm, and judging the cacheability of the domain name according to the comparison between the cache value and a preset value; and outputting the judgment result of the cacheability of the domain name. The method and the system for analyzing the cacheability of the domain name can judge whether the domain name is suitable for caching or not and give a caching suggestion.

Description

Domain name cacheability analysis method and system
Technical Field
The invention relates to the technical field of the internet of communication technology, in particular to a method and a system for analyzing the cacheability of a domain name.
Background
The domain name is the name of a certain computer or group of computers on the internet, which is composed of a string of names separated by dots, and is used for identifying the electronic orientation of the computer during data transmission. The domain name is an important identifier of an internet access unit and an individual on the network, plays a role in identification, and is convenient for others to identify and retrieve information resources of a certain enterprise, organization or individual, so that resource sharing on the network is better realized.
The domain name is a set of address conversion system specially established for the convenience of memory, and domain name resolution is a service for directing domain names to a website space IP (Internet Protocol, Protocol for interconnection between networks) so that people can conveniently access websites through registered domain names. To access a server on the internet, it must be finally realized by an IP address, and domain name resolution is a process of converting a domain name into an IP address again. The domain name cache is an optimization mechanism for storing data by a domain name system for quickly reading or avoiding repeated resource requests, and the effective cache can avoid repeated domain name resolution requests and quickly read IP addresses. The blind caching of domain names can occupy a large amount of memory, but at present, no method for analyzing whether domain names are suitable for caching exists.
Since there is no method for analyzing whether the domain name is suitable for caching, it cannot be determined whether the domain name is suitable for caching.
Disclosure of Invention
In view of the above, the present invention provides a method and a system for analyzing cacheability of a domain name to determine whether the domain name is suitable for caching and to provide a caching suggestion.
To solve the foregoing problem, in a first aspect, an embodiment of the present invention provides a method for analyzing cacheability of a domain name, where the method includes: collecting domain name data, and importing the domain name data into an original domain name database; extracting the domain name to be analyzed and the characteristic data which can be cached in the domain name from the original domain name data, and temporarily storing the domain name to be analyzed and the characteristic data; calculating a cache value of the domain name according to the cacheable feature data of the domain name to be analyzed and a cacheable analysis algorithm, and judging the cacheability of the domain name according to the comparison between the cache value and a preset value; and outputting the analysis and judgment result of the cacheability of the domain name to be analyzed.
With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where the cacheable feature data includes a URL (Uniform Resource Locator), an internet Resource type, a Resource attribute, and an expiration time of a document, where the internet Resource type is classified according to Resource service features, the Resource attribute includes a dynamic Resource and a static Resource, the dynamic Resource is a Resource used for data transformation and then invocation, and the static Resource is a Resource used for direct invocation.
With reference to the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where the calculating and determining step includes: and calculating two indexes according to the cacheable characteristic data of the domain name to be analyzed, wherein the resource attributes comprise the proportion of the number of static resources to the total number of URL resources and the proportion of the expiration time of the document in the static resources to be greater than the cache time limit.
According to the two index values and the occupied weight thereof, calculating by weighted summation to obtain a cache value of the domain name; and when the cache value is larger than a preset cache threshold value, judging that the cacheability of the domain name is a suggested cache, and when the cache value is smaller than or equal to the preset cache threshold value, judging that the cacheability of the domain name is a non-suggested cache.
With reference to the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where a resource attribute of the cacheable feature data is a static resource, and further includes a resource file length and a resource file change period attribute; dividing the static resource into a cacheable object and a non-cacheable object; the cacheable object is a static resource with a resource file change period exceeding a set period, and the non-cacheable object is a static resource with a resource file change period less than or equal to the set period; the calculating and judging step comprises the following steps:
calculating to obtain five indexes according to the cacheable feature data of the domain name to be analyzed, wherein the five indexes are the proportion of static resources to the total number of URL resources, the proportion of cacheable objects in the static resources to the static resources, the proportion of non-cacheable objects in the static resources to the static resources, the proportion of large files in the cacheable objects in the static resources to the total number of cacheable objects in the static resources, and the proportion of expiration time of the files in the static resources greater than the cache time limit; calculating the cache value of the domain name through a weighted summation calculation formula according to the five indexes; and when the cache value is larger than a preset cache threshold value, judging that the cacheability of the domain name is a suggested cache, and when the cache value is smaller than or equal to the preset cache threshold value, judging that the cacheability of the domain name is a non-suggested cache.
With reference to the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where after the outputting step, the method further includes the step of establishing a cacheable domain name library: establishing a cacheable domain name library according to the cacheability judgment result of the domain name, wherein the cacheable domain name library is used for storing, inquiring, updating and analyzing the cacheable domain name and dynamically maintaining the current latest cacheable domain name; after establishing the cacheable domain name library, the acquisition step is as follows: after the domain name data is collected, comparing the domain name data with the domain name in the cacheable domain name library, and if the domain name information corresponding to the domain name to be analyzed is stored in the cacheable domain name library, directly entering the output step; and if the domain name information corresponding to the domain name to be analyzed does not exist in the cacheable domain name database, importing the acquired domain name data into the original domain name database.
The domain name library may include the domain name, and one or more of attribution of the domain name, the resource type, a cacheable proportion, whether HTTPS (Hyper Text Transfer Protocol over Secure Socket Layer, network Protocol), a server port, a cache suggestion, and probe time.
With reference to the first aspect, an embodiment of the present invention provides a fifth possible implementation manner of the first aspect, where the acquiring step acquires the network resource data in a crawler, a DNS log, DPI data, and a packet capturing manner.
With reference to the first aspect, an embodiment of the present invention provides a sixth possible implementation manner of the first aspect, where the total network outgoing traffic is extracted from the original domain name library, the outgoing request information is statistically analyzed, a ranking condition of the total network outgoing traffic is provided, and a decision basis is provided for a user whether to cache a website.
In a second aspect, an embodiment of the present invention provides a system for analyzing cacheability of a domain name, where the system includes: the acquisition module is used for acquiring domain name data and importing the domain name data into an original domain name database; the extraction module is used for extracting the domain name to be analyzed and the cacheable feature data from the domain name data and temporarily storing the cacheable feature data and the domain name; the calculation and judgment module is used for calculating the cache value of the domain name according to the cacheable feature data of the domain name to be analyzed and the cacheable analysis algorithm obtained by the extraction module, and judging the cacheability of the domain name according to the comparison between the cache value and a preset value; and the output module is used for outputting the analysis and judgment result of the cacheability of the domain name to be analyzed, which is obtained by the calculation and judgment module.
With reference to the second aspect, an embodiment of the present invention provides a first possible implementation manner of the second aspect, where the cacheable feature data includes a URL, an internet resource type, a resource attribute, and an expiration time of a resource document, where the internet resource type is classified according to resource service features, the resource attribute includes a dynamic resource and a static resource, the dynamic resource is a resource used for data transformation and then invocation, and the static resource is a resource used for direct invocation; the resource attribute of the cacheable feature data is static resource, and further comprises the length of a resource file and the change period attribute of the resource file; dividing the static resource into a cacheable object and a non-cacheable object; the cacheable object is a static resource with a resource file change period exceeding a set period, and the non-cacheable object is a static resource with a resource file change period less than or equal to the set period.
With reference to the second aspect, an embodiment of the present invention provides a second possible implementation manner of the second aspect, where the calculation and judgment module includes a calculation unit and a judgment unit, and the calculation unit includes a first calculation unit and a second calculation unit:
the first calculation unit is used for calculating two indexes according to the cacheable feature data of the domain name to be analyzed, wherein the two indexes comprise the resource attribute which is the proportion of the number of static resources to the total number of the resources and the proportion of the expiration time of a document in the static resources which is greater than the cache time limit; and calculating to obtain the cache value of the domain name through weighted summation according to the two index values and the occupied weight thereof.
And when the cache value is larger than a preset cache threshold value, judging that the cacheability of the domain name is a suggested cache, and when the cache value is smaller than or equal to the preset cache threshold value, judging that the cacheability of the domain name is a non-suggested cache.
The second calculating unit is used for calculating five indexes according to the cacheable feature data of the domain name to be analyzed, wherein the five indexes comprise the proportion of the resource attribute of the number of static resources to the total number of the resources, the proportion of cacheable objects in the static resources to the static resources, the proportion of non-cacheable objects in the static resources to the static resources, the proportion of large files in the cacheable objects in the static resources to the total number of the cacheable objects in the static resources, and the proportion of the expiration time of the files in the static resources greater than the cache time limit; and obtaining the cache value of the domain name through weighted summation calculation according to the five index values and the occupied weight thereof.
And the judging unit is used for judging according to the cache value obtained by the first calculating unit or the second calculating unit, judging that the cacheability of the domain name is a suggested cache when the cache value is larger than a preset cache threshold, and judging that the cacheability of the domain name is a non-suggested cache when the cache value is smaller than or equal to the preset cache threshold.
With reference to the foregoing implementation manner of the second aspect, an embodiment of the present invention provides a third possible implementation manner of the second aspect, where the system further includes a library establishing module, configured to establish a cacheable domain name library according to a determination result of cacheability of the domain name output by the output module, where the library is used to store, query, update, and analyze the cacheable domain name, and dynamically maintain a current latest cacheable domain name; the acquisition module compares the acquired domain name data with the domain names in the cacheable domain name library, and directly sends the queried domain name information to the output module if the domain name information corresponding to the domain name to be analyzed is stored in the cacheable domain name library; and if the domain name information corresponding to the domain name to be analyzed does not exist in the query cacheable domain name database, importing the acquired domain name data into the original domain name database, and calling an extraction module.
With reference to the second aspect, an embodiment of the present invention provides a fourth possible implementation manner of the second aspect, where the collecting module includes a crawler module, a DNS log analyzing module, a DPI data analyzing module, and a packet capturing module, and is configured to collect network resource data by means of crawler, DNS log analyzing, DPI data, and packet capturing, respectively.
In the embodiment of the invention, domain name data is firstly collected, the domain name data is imported into an original domain name database, then a domain name to be analyzed and cacheable feature data in the domain name are extracted from the original domain name data, then a cache value of the domain name is calculated according to the cacheable feature data and a cacheable analysis algorithm, the cacheability of the domain name is judged according to the cache value, and finally a judgment result of the cacheability of the domain name is output. Therefore, the method and the system for analyzing the cacheability of the domain name provided by the embodiment of the invention can judge whether the domain name is suitable for caching and give a caching suggestion; and then according to the cache suggestion, by increasing the cached domain names and deleting the non-cached domain names, the cache of the domain names with low efficiency and low cache hit rate can be reduced, the waste of resources such as internal memory and the like is avoided, the cache of the domain names with high efficiency and high cache hit rate is increased, and the access efficiency is improved.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a flowchart illustrating a method for cacheability analysis of a domain name according to a first embodiment of the present invention;
fig. 2 is a detailed flowchart illustrating step S300 in a method for analyzing cacheability of a domain name according to a first embodiment of the present invention;
fig. 3 is a schematic block diagram illustrating a cacheability analysis system for domain names according to a second embodiment of the present invention;
fig. 4 is a schematic diagram illustrating a calculation determination module in a system for analyzing cacheability of a domain name according to a second embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
In view of that there is no method for analyzing whether a domain name is suitable for caching, embodiments of the present invention provide a method and a system for analyzing cacheability of a domain name, which are described in detail below with reference to the embodiments.
Example one
Fig. 1 is a flowchart illustrating a method for analyzing cacheability of a domain name according to a first embodiment of the present invention, where the method includes the following steps:
step S100, collecting: and collecting domain name data and importing the domain name data into an original domain name database.
In order to perform cacheability analysis of a domain name, domain name data are collected firstly, the collected domain name data are arranged into a required data structure record and are imported into an original domain name database, and cacheable feature data can be obtained simultaneously when the domain name data are collected. Each domain name and its data are recorded as a piece of data.
Example domain name data records are structured as: the domain name, the attribution of the domain name, the URL, the internet resource type, the resource attribute, whether HTTPS exists or not, the expiration time of the resource document, the server port, the detection time and the like, so that the data structure of the field content and the field sequence is determined. Some of these data fields contain cacheability features such as URLs, internet resource types, resource attributes, and document expiration times.
The step of collecting may collect the network resource data from the internet via a crawler, preferably a distributed web crawler. The distributed web crawler comprises a plurality of crawlers, the crawlers download web pages from the internet, such as news websites (new waves, search foxes and the like) with large access quantity, and the crawlers acquire domain name data related to the websites, and the domain name data are used for performing cache analysis of domain name resources subsequently; then, the resource data (domain name data) of the domain name is imported into the database, and at the same time, the URLs are extracted from the webpage information and the crawling is continued along the direction of the URLs.
In order to adapt to different requirements, parameters of the crawler strategy can be customized. The crawling strength and the crawling mode more suitable for actual conditions can be selected according to actual requirements. For example, the crawling level can be adjusted according to the actual load condition of the probe node, and the crawling depth is increased or reduced; different crawling modes can be set according to actual demands, the crawling mode comprises direct crawling and search engine crawling, accuracy of the direct crawling is high, coverage of the search engine crawling is wide, and the situation that the crawling cannot be performed due to the fact that safety protection of partial websites or websites do not have direct entries can be avoided.
Besides crawling and collecting network resource data through a crawler, the network resource data can be collected through other modes such as DNS logs, DPI data and packet capturing.
Step S200, extraction step: extracting the domain name data to be analyzed and the cacheable feature data from the original domain name data, and temporarily storing the domain name and the cacheable feature data.
The domain name data acquired and obtained in step S100 contains a lot of information, and the method for analyzing the cacheability of a domain name according to the first embodiment of the present invention only needs to extract part of the information, so that the domain name to be analyzed and the cacheable feature data need to be extracted from the acquired domain name data. The cacheable data is easy to call and can be used efficiently for a long time. The cacheable feature data includes, for example, a URL, an internet resource type, a resource attribute, and an expiration time of a document, where the internet resource type is classified according to resource service features, such as a video, a web page, and the like, the resource attribute includes a dynamic resource and a static resource, the dynamic resource is a resource that is called after data conversion, for example, a resource that needs to read data from a database and convert the data into HTML (HyperText Markup Language) and then call, and the static resource is a directly-callable resource, such as a URL without any parameter. The static resource attribute includes a cacheable object attribute, which refers to a resource corresponding to a state code of a cacheable HTTP (HyperText Transfer Protocol), where the state code includes 200, 203, 300, and 301. The static resource attributes also include uncacheable object attributes, including: the fields of the http 1.0 containing the Set-Cookie, the http 1.1 containing the Set-Cookie and the Cache-Control are 'No-Cache' or 'private', the fields of the Pragma: No-Cache ',' Authorization 'and' Cache-Control 'are' No-Cache, No-store and private ', and the information of the Last-Modified' is not included. The static resource attributes also comprise a file length attribute and a file change period attribute.
In order to facilitate the call in the subsequent analysis calculation, the cacheable feature data needs to be temporarily stored by using a reasonable data structure and an index needs to be established.
Step S300, calculating and judging step: according to the cacheable feature data of the domain name to be analyzed obtained in the step S200, in combination with a cacheable analysis algorithm, calculating a cache value of the domain name, and determining the cacheability of the domain name according to the cache value.
The cacheable analysis algorithm is to calculate a cache value of the domain name by weighted summation according to the aspects of whether the cacheable feature data support easy calling, relative stability and longer-term effective use, compare the cache value with a preset value and judge the cacheability of the domain name.
A simple algorithm is to calculate two indexes according to the cacheable feature data of the domain name to be analyzed, wherein the resource attributes are the proportion of the number of static resources to the total number of URL resources, and the proportion of the expiration time of a document in the static resources to be greater than the cache time limit.
According to the two index values and the occupied weight thereof, calculating by weighted summation to obtain a cache value of the domain name; and when the cache value is larger than a preset cache threshold value, judging that the cacheability of the domain name is a suggested cache, and when the cache value is smaller than or equal to the preset cache threshold value, judging that the cacheability of the domain name is a non-suggested cache.
The method is simple in calculation and only needs to pay attention to the static resource occupation ratio and the occupation ratio of the expired use time limit.
There is also a more accurate and refined algorithm. Fig. 2 is a detailed flowchart of step S300 in the method for analyzing cacheability of a domain name according to the first embodiment of the present invention, where the step further includes the following three sub-steps:
step S301, calculating to obtain five indexes according to the cacheable feature data, wherein the five indexes include resource attributes which are the proportion of the number of static resources to the total number of URL resources, the proportion of cacheable objects in the static resources to the static resources, the proportion of non-cacheable objects in the static resources to the static resources, the proportion of large files in the cacheable objects in the static resources to the total number of cacheable objects in the static resources, and the proportion of expiration time of the files in the static resources to be greater than the cache time limit;
the cacheable object is a static resource with a change period exceeding a set period, and the non-cacheable object is a static resource with a change period less than or equal to the set period, where the set period may be set according to an empirical value and an actual situation, and is not limited here. Generally, js (javascript), CSS (Cascading Style Sheets), and pictures are cacheable objects that are not changed for a long time, and it is a non-cacheable object that needs to request the latest data every time.
In the first embodiment of the present invention, for the cacheability determination of the domain name, mainly based on the cacheable feature data crawled by crawlers, five indexes can be calculated by the cacheable feature data, and the five indexes are comprehensively evaluated to determine the cacheability of the domain name. The five indexes are respectively as follows:
(1) the proportion of the static resource to the total number of the URLs refers to the proportion of the static resource to the total number of the URLs under the domain name thereof, and is denoted as t 1.
(2) The proportion of the cacheable object in the static resource to the static resource is denoted as t 2.
(3) The proportion of the non-cacheable object in the static resource to the static resource is denoted as t 3.
(4) The proportion of the large file in the cacheable objects in the static resource to the total number of the cacheable objects in the static resource is denoted as t4, wherein the large file is fmax or more, and the fmax can be customized according to actual requirements.
(5) The expiration time of the document in the static resource is larger than the proportion of the cache time limit, namely the max-age is larger than the proportion of tmax, which is marked as t5, wherein tmax can be defined by the size according to the actual requirement.
Step S302, according to the five indexes, calculating the cache value of the domain name through a weighted sum calculation formula.
In order to comprehensively evaluate the five indexes, a weight calculation formula can be designed, wherein S is w1 t1+ w2 t2+ w3 t3+ w4 t4+ w5 t5, S is a cache value, and w1, w2, w3, w4 and w5 are respectively the weights of t1, t2, t3, t4 and t5 in sequence. And substituting each index into the weighting calculation formula to calculate to obtain a cache value S.
Step S303, according to the comparison between the cache value and the preset value, the cacheability of the domain name is judged.
Specifically, when the cache value is greater than a preset cache threshold, the cacheability of the domain name is judged as a suggested cache, and when the cache value is less than or equal to the preset cache threshold, the cacheability of the domain name is judged as an unrendered cache. If the preset cache threshold value is marked as a, judging that the cacheability of the domain name is a suggested cache when S is larger than a, and judging that the cacheability of the domain name is a non-suggested cache when S is smaller than or equal to a. The preset cache threshold value can be comprehensively set according to the empirical value and the actual occupation condition of the hard disk space.
Table 1 shows the above five indexes and their limit values, which can be customized according to the actual requirement, for example, according to the actual occupation of the hard disk space and the cache hit condition, the ratio t1 of the customized static resource to the total number of URLs is less than or equal to 0.5, and then the website with the ratio t1 to the total number of URLs > 0.5 is judged as not suggested cache.
Table 2 shows the weights of the five indexes, fmax, tmax, and the size of the preset cache threshold a, and the size of the parameters may also be customized according to actual requirements, for example, the preset cache threshold a is determined according to a memory condition, a domain name condition, and a cache target to be reached, where table 2 is only an example.
TABLE 1
Name (R) t1 t2 t3 t4 t5
Limit value ≤0.5 ≥0.8 ≤0.2 ≤0.2 ≥0.3
TABLE 2
Name (R) w1 w2 w3 w4 w5 fmax tmax a
Size and breadth 0.05 0.45 0.05 0.05 0.4 10mb 10min 0.6
As shown in table 2, if the size of each index of domain name 1 is 0.5, 0.8, 0.2, 0.5 in this order, the domain name 1 is determined to be suggested for caching because S1 > a is 0.6 by substituting the weight calculation formula of S-w 1-t 1+ w 2-t 2+ w 3-t 3+ w 4-t 4+ w 5-t 5 and calculating the resulting S1 to be 0.605 while the preset cacheable threshold value is 0.6.
And if the sizes of the indexes of the domain name 2 are 0.5, 0.8, 0.2, 0.1 and 0.4 in sequence, substituting the weighting calculation formula into S-w 1-t 1+ w 2-t 2+ w 3-t 3+ w 4-t 4+ w 5-t 5, calculating to obtain S2-0.56, and judging that the domain name 2 is not recommended to be cached because S2 < a-0.6.
Step S400, outputting the result of determining the cacheability of the domain name to be analyzed.
The method comprehensively judges and obtains the judgment result of the cacheability of the domain name.
Preferably, the judgment result can be made into a list for output, so that the user can conveniently check the judgment result. The output judgment result list may include, for example, the following: the domain name, and the attribution of the domain name, the internet resource type, the resource attribute, the cacheable proportion, whether to HTTPS, the server port, the probe time, and the cache suggestion.
Further, according to the above caching proposal, a cacheable domain name library may be established, which may include the domain name, and the attribution of the domain name, the resource type, the cacheable proportion, whether HTTPS is used, the server port, the caching proposal, the probe time, and the like. The cacheable domain name repository is used to store, query, update, and analyze domain names. And adjusting the current cacheable domain name database, such as adding cacheable domain names, deleting non-cacheable domain names and dynamically maintaining the current latest cacheable domain names. The cache of the domain name with low efficiency and low cache hit rate can be reduced, the waste of resources such as memory is avoided, the cache of the domain name with high efficiency and high cache hit rate is increased, and the access efficiency is improved.
It should be noted that the acquiring step may acquire the network resource data in various manners, such as a crawler, a DNS log, DPI data, and a packet capture.
In the embodiment of the invention, domain name data is firstly collected, the domain name data is imported into an original domain name database, then a domain name to be analyzed and cacheable feature data are extracted from the domain name database, then a cache value of the domain name is calculated according to the cacheable feature data and a cacheable analysis algorithm, the cacheability of the domain name is judged according to the cache value, and finally a judgment result of the cacheability of the domain name is output. Therefore, the method for analyzing the cacheability of the domain name provided by the embodiment of the invention is used for giving a cache suggestion by judging whether the domain name is suitable for caching; and then according to the cache suggestion, the existing cacheable domain names are added, the non-cacheable domain names are deleted, the cache of the domain names with low efficiency and low cache hit rate can be reduced, the waste of resources such as internal memory is avoided, the cache of the domain names with high efficiency and high cache hit rate is increased, and the domain name access efficiency is improved.
The method for analyzing cacheability of a domain name provided in the first embodiment of the present invention further provides a domain name comparison function of a cacheable domain name library, which is reflected in the acquisition step. The acquisition steps are as follows: after the domain name data is collected, comparing the domain name data with the domain name in the cacheable domain name library, if the domain name information corresponding to the domain name is stored in the cacheable domain name library, directly using the cacheability suggestion stored in the cacheable domain name library, and directly entering an output step; that is, the related information and cache suggestion of the domain name can be obtained quickly through domain name comparison. And if the domain name information corresponding to the domain name to be analyzed does not exist in the cacheable domain name database, importing the acquired domain name data into the original domain name database, and entering an extraction step. Therefore, repeated domain name resolution requests can be avoided, the IP address can be read quickly, repeated analysis and judgment operations are reduced, and the efficiency of domain name cacheability judgment is improved.
In order to further analyze the cacheability of the domain name, the method for analyzing the cacheability of the domain name according to the first embodiment of the present invention further includes extracting the whole-network outgoing traffic from the original domain name database, analyzing and counting outgoing request information, providing a ranking condition of the whole-network outgoing traffic, and providing a further basis for a user to decide whether to cache the website. Preferably, the whole network outgoing flow can be subjected to mirror image analysis, the detailed information of all the outgoing requests is counted, and the relevant characteristics of the outgoing requests are analyzed, so that decision basis information is provided, wherein the decision basis information comprises resource types, Content-Length attributes, Cookies (small text files), return codes and caching suggestions. For example, the flow direction of the traffic is analyzed, and if the cross-province or cross-network traffic is large (generally, more than 30% of the total traffic is considered as large traffic, and this threshold may be adjusted according to actual conditions), it is suggested to cache the domain name corresponding to the traffic.
When the cacheability of the domain name cannot be judged due to a problem, the information needs to be timely returned to the maintainer and processed by the maintainer. The specific mode can be as follows: for the domain name which can not judge the cacheability, returning the collected information such as URL, resource type, max-age and the like to the maintenance personnel; maintenance personnel need to find problems and quickly judge the problems at the first time, and update the cache algorithm in time, so that the accuracy of the cache engine is continuously improved.
Based on the above analysis, it can be seen that the method for analyzing cacheability of a domain name provided by the embodiment of the present invention has the following beneficial effects: (1) whether the domain name is suitable for caching can be judged, and caching suggestions are given; (2) the judgment result of the cacheability of the domain name can be output in a list form, so that the user can conveniently check the domain name; (3) the subsequent adjustment of the domain name which can be cached is facilitated, a dynamic domain name database which can be cached is formed, the domain name which can be cached is increased, and the domain name which can not be cached is deleted, so that the cache of the domain name with low efficiency and low cache hit rate is reduced, the waste of resources such as internal memory is avoided, the cache of the domain name with high efficiency and high cache hit rate is increased, and the domain name access efficiency is improved; (4) the domain name comparison function of the cacheable domain name library can directly use the cacheability judgment result stored in the domain name library through comparison, so that the repeated analysis and judgment operation is reduced, and the efficiency of domain name cacheability judgment is improved.
Example two
Corresponding to the method in the first embodiment, an embodiment of the present invention further provides a system for analyzing cacheability of a domain name, which is used for executing the method in the first embodiment. Fig. 3 is a schematic diagram illustrating a module composition of a system for analyzing cacheability of a domain name according to a second embodiment of the present invention, where as shown in fig. 3, the system for analyzing cacheability of a domain name in this embodiment includes: the acquisition module 100 is configured to acquire domain name data and import the domain name data into an original domain name database; an extracting module 200, configured to extract a domain name to be analyzed and cacheable feature data therein from original domain name data, and temporarily store the cacheable feature data and the domain name; the calculation and judgment module 300 is configured to calculate a cache value of the domain name according to the cacheable feature data of the domain name to be analyzed and the cacheable analysis algorithm obtained by the extraction module, and judge the cacheability of the domain name according to comparison between the cache value and a preset value; an output module 400, configured to output the analysis and determination result of the cacheability of the domain name to be analyzed, which is obtained by the calculation and determination module.
Preferably, the collecting module 100 may include a crawler module, a DNS log analyzing module, a DPI data analyzing module, and a packet capturing module, and is configured to collect the network resource data by means of crawler, DNS log analyzing, DPI data, and packet capturing, respectively.
Preferably, the cacheable feature data of the domain name to be analyzed, which is obtained by the extraction module 200, includes a URL, an internet resource type, a resource attribute, and an expiration time of a resource document, where the internet resource type is classified according to resource service features, the resource attribute includes a dynamic resource and a static resource, the dynamic resource is a resource used for data transformation and then calling, and the static resource is a resource used for direct calling; the resource attribute of the cacheable feature data is static resource, and further comprises the length of a resource file and the change period attribute of the resource file; dividing the static resource into a cacheable object and a non-cacheable object; the cacheable object is a static resource with a resource file change period exceeding a set period, and the non-cacheable object is a static resource with a resource file change period less than or equal to the set period.
Fig. 4 is a schematic diagram illustrating a composition of a calculation and judgment module in a system for analyzing cacheability of a domain name according to a second embodiment of the present invention, and as shown in fig. 4, the calculation and judgment module 300 includes a calculation unit and a judgment unit, and the calculation unit includes a first calculation unit and a second calculation unit.
A first calculating unit 301, configured to calculate two indexes according to the cacheable feature data of the domain name to be analyzed, where the two indexes include a ratio of a resource attribute to a total amount of the resource, where the number of static resources is a ratio of the total amount of the resource, and a ratio of expiration time of a document in the static resources is greater than a cache time limit; according to the two index values and the occupied weight thereof, calculating by weighted summation to obtain a cache value of the domain name;
and when the cache value is larger than a preset cache threshold value, judging that the cacheability of the domain name is a suggested cache, and when the cache value is smaller than or equal to the preset cache threshold value, judging that the cacheability of the domain name is a non-suggested cache.
A second calculating unit 302, configured to calculate five indexes according to the cacheable feature data of the domain name to be analyzed, where the five indexes include a ratio of a resource attribute to a number of static resources to a total number of resources, a ratio of a cacheable object in the static resources to the static resources, a ratio of a non-cacheable object in the static resources to the static resources, a ratio of a large file in the cacheable object in the static resources to the total number of cacheable objects in the static resources, and a ratio of expiration time of a file in the static resources greater than a cache time limit; then, according to the five indexes, the cache value of the domain name is calculated through a weighted calculation formula.
A determining unit 303, configured to perform a determination according to the cache value obtained by the calculation of the first calculating unit 301 or the second calculating unit 302, determine that the cacheability of the domain name to be analyzed is a suggested cache when the cache value is greater than a preset cache threshold, and determine that the cacheability of the domain name to be analyzed is a non-suggested cache when the cache value is less than or equal to the preset cache threshold.
In the embodiment of the present invention, the acquisition module 100 acquires domain name data, and imports the domain name data into an original domain name database, then the extraction module 200 extracts cacheable feature data and a domain name to be analyzed from the original domain name database, the judgment module 300 calculates a cache value of the domain name according to the cacheable feature data and a cacheable analysis algorithm, and judges the cacheability of the domain name according to comparison between the cache value and a preset value, and finally the output module 400 outputs a judgment result of the cacheability of the domain name. Therefore, the system for analyzing the cacheability of the domain name provided by the embodiment of the invention can judge whether the domain name is suitable for caching and give a caching suggestion; and then according to the cache suggestion, by increasing the cached domain names and deleting the non-cached domain names, the cache of the domain names with low efficiency and low cache hit rate can be reduced, the waste of resources such as internal memory and the like is avoided, the cache of the domain names with high efficiency and high cache hit rate is increased, and the access efficiency is improved.
In order to reduce repetitive analysis and determination operations and improve work efficiency, the system for analyzing cacheability of a domain name provided in the second embodiment of the present invention further includes a library establishing module, configured to establish a cacheable domain name library according to a determination result of cacheability of a domain name output by the output module, and configured to store, query, update, and analyze the cacheable domain name, dynamically maintain a current latest cacheable domain name, add a cacheable domain name, and delete a non-cacheable domain name. The domain name repository may include domain names, and one or more of attribution of domain names, resource types, cacheable proportions, whether HTTPS is available, server ports, caching suggestions, probe times. The cacheable domain name library is used for saving, querying, updating, analyzing domain names and the like. The acquisition module compares the acquired domain name data with the domain names in the cacheable domain name library, and directly sends the queried domain name information to the output module if the domain name information corresponding to the domain name to be analyzed is stored in the cacheable domain name library; if the domain name information corresponding to the domain name to be analyzed does not exist in the query cacheable domain name database, the acquired domain name data is imported into the original domain name database, and an extraction module is called to perform domain name cacheability analysis.
Based on the above analysis, the cacheability analysis system for domain names provided by the embodiment of the present invention has the following beneficial effects: (1) whether the domain name is suitable for caching can be judged, and caching suggestions are given; (2) the judgment result of the caching performance of the domain name can be output in a list form, and the user can check the domain name conveniently; (3) the subsequent adjustment of the domain name is facilitated, the domain name which can be cached is increased, and the domain name which cannot be cached is deleted, so that the cache of the domain name with low efficiency and low cache hit rate is reduced, the waste of resources such as internal memory is avoided, the cache of the domain name with high efficiency and high cache hit rate is increased, and the access efficiency is improved; (4) the domain name comparison function of the domain name library can directly use the cacheability judgment result stored in the domain name library through comparison, so that repeated analysis and judgment operations are reduced, and the efficiency of domain name cacheability judgment is improved.
The system for analyzing cacheability of a domain name provided by the embodiment of the present invention may be specific hardware on a device, or software or firmware installed on a device, or the like. The system provided by the embodiment of the present invention has the same implementation principle and technical effect as the foregoing method embodiment, and for the sake of brief description, no mention is made in the system embodiment, and reference may be made to the corresponding contents in the foregoing method embodiment. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the foregoing systems and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided by the present invention, it should be understood that the disclosed method and system may be implemented in other ways. The above-described system embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and there may be other divisions in actual implementation, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of systems or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided by the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present invention in its spirit and scope. Are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (11)

1. A method for analyzing cacheability of a domain name, the method comprising:
the collection step comprises: collecting domain name data, and importing the domain name data into an original domain name database;
the extraction step comprises: extracting the domain name to be analyzed and the characteristic data which can be cached in the domain name from the original domain name data, and temporarily storing the domain name to be analyzed and the characteristic data;
and (3) calculating and judging: calculating a cache value of the domain name according to the cacheable feature data of the domain name to be analyzed and a cacheable analysis algorithm, and judging the cacheability of the domain name according to the comparison between the cache value and a preset value;
an output step: and outputting the analysis and judgment result of the cacheability of the domain name to be analyzed.
2. The method of claim 1, wherein the cacheable feature data comprises a URL, an internet resource type, a resource attribute, and an expiration time of a resource document, wherein the internet resource type is classified according to resource service features, the resource attribute comprises a dynamic resource and a static resource, the dynamic resource is a resource used for data transformation and then calling, and the static resource is a resource used for direct calling.
3. The method of claim 2, wherein the step of computationally determining comprises:
calculating two indexes according to the cacheable feature data of the domain name to be analyzed, wherein the resource attributes comprise the proportion of the number of static resources to the total number of URL resources and the proportion of the expiration time of a document in the static resources to be greater than the cache time limit;
according to the two index values and the occupied weight thereof, calculating by weighted summation to obtain a cache value of the domain name; and when the cache value is larger than a preset cache threshold value, judging that the cacheability of the domain name is a suggested cache, and when the cache value is smaller than or equal to the preset cache threshold value, judging that the cacheability of the domain name is a non-suggested cache.
4. The method according to claim 2, wherein the resource attribute of the cacheable feature data is static resource, and further comprises a resource file length and a resource file change period attribute; dividing the static resource into a cacheable object and a non-cacheable object; the cacheable object is a static resource with a resource file change period exceeding a set period, and the non-cacheable object is a static resource with a resource file change period less than or equal to the set period;
the calculating and judging step comprises the following steps:
calculating five indexes according to the cacheable characteristic data of the domain name to be analyzed, wherein the resource attributes comprise the proportion of the number of static resources to the total number of URL resources, the proportion of cacheable objects in the static resources to the static resources, the proportion of non-cacheable objects in the static resources to the static resources, the proportion of large files in the cacheable objects in the static resources to the total number of the cacheable objects in the static resources, and the proportion of expiration time of the files in the static resources to be greater than the caching time limit;
according to the five index values and the weights occupied by the five index values, calculating by weighted summation to obtain a cache value of the domain name;
and when the cache value is larger than a preset cache threshold value, judging that the cacheability of the domain name is a suggested cache, and when the cache value is smaller than or equal to the preset cache threshold value, judging that the cacheability of the domain name is a non-suggested cache.
5. The method according to claim 3 or 4, wherein after the outputting step, the method further comprises the step of establishing a cacheable domain name library: establishing a cacheable domain name library according to the cacheability judgment result of the domain name, wherein the cacheable domain name library is used for storing, inquiring, updating and analyzing the cacheable domain name and dynamically maintaining the current latest cacheable domain name; after establishing the cacheable domain name library, the acquisition step is as follows: after the domain name data is collected, comparing the domain name data with the domain name in the cacheable domain name library, and if the domain name information corresponding to the domain name to be analyzed is stored in the cacheable domain name library, directly entering the output step; and if the domain name information corresponding to the domain name to be analyzed does not exist in the cacheable domain name database, importing the acquired domain name data into the original domain name database.
6. The method of claim 1, wherein the collecting step collects the network resource data by crawler, DNS log, DPI data, packet capture.
7. The method of claim 1, further comprising: extracting the whole network outgoing flow from the original domain name library, counting and analyzing the outgoing request information, providing the ranking condition of the whole network outgoing flow, and providing a decision basis for a user whether to cache a website.
8. A system for cacheability analysis of a domain name, the system comprising:
the acquisition module is used for acquiring domain name data and importing the domain name data into an original domain name database;
the extraction module is used for extracting the domain name to be analyzed and the cacheable feature data from the original domain name data and temporarily storing the cacheable feature data and the domain name;
the calculation and judgment module is used for calculating the cache value of the domain name according to the cacheable feature data of the domain name to be analyzed and the cacheable analysis algorithm obtained by the extraction module, and judging the cacheability of the domain name according to the comparison between the cache value and a preset value;
and the output module is used for outputting the analysis and judgment result of the cacheability of the domain name to be analyzed, which is obtained by the calculation and judgment module.
9. The system of claim 8, wherein the cacheable feature data comprises a URL, an internet resource type, a resource attribute, and an expiration time of a resource document, wherein the internet resource type is classified according to resource service features, the resource attribute comprises a dynamic resource and a static resource, the dynamic resource is a resource for data transformation and then invocation, and the static resource is a resource for direct invocation; the resource attribute of the cacheable feature data is static resource, and further comprises the length of a resource file and the change period attribute of the resource file; dividing the static resource into a cacheable object and a non-cacheable object; the cacheable object is a static resource with a resource file change period exceeding a set period, and the non-cacheable object is a static resource with a resource file change period less than or equal to the set period.
10. The system according to claim 9, wherein the calculation judgment module comprises a calculation unit and a judgment unit, the calculation unit comprises a first calculation unit and a second calculation unit:
the first calculation unit is used for calculating two indexes according to the cacheable feature data of the domain name to be analyzed, wherein the two indexes comprise the resource attribute which is the proportion of the number of static resources to the total number of the resources and the proportion of the expiration time of a document in the static resources which is greater than the cache time limit; according to the two index values and the occupied weight thereof, calculating by weighted summation to obtain a cache value of the domain name;
and when the cache value is larger than a preset cache threshold value, judging that the cacheability of the domain name is a suggested cache, and when the cache value is smaller than or equal to the preset cache threshold value, judging that the cacheability of the domain name is a non-suggested cache.
The second calculating unit is used for calculating five indexes according to the cacheable feature data of the domain name to be analyzed, wherein the five indexes comprise the proportion of the resource attribute of the number of static resources to the total number of the resources, the proportion of cacheable objects in the static resources to the static resources, the proportion of non-cacheable objects in the static resources to the static resources, the proportion of large files in the cacheable objects in the static resources to the total number of the cacheable objects in the static resources, and the proportion of the expiration time of the files in the static resources greater than the cache time limit; according to the five index values and the weights occupied by the five index values, obtaining the cache value of the domain name through weighted summation calculation;
and the judging unit is used for judging according to the cache value obtained by the first calculating unit or the second calculating unit, judging that the cacheability of the domain name is a suggested cache when the cache value is larger than a preset cache threshold, and judging that the cacheability of the domain name is a non-suggested cache when the cache value is smaller than or equal to the preset cache threshold.
11. The system according to any one of claims 8 to 10, wherein the system further comprises a library building module, configured to build a cacheable domain name library according to the result of the judgment on the cacheability of the domain name output by the output module, and configured to store, query, update, and analyze the cacheable domain name, and dynamically maintain the current latest cacheable domain name; the acquisition module compares the acquired domain name data with the domain names in the cacheable domain name library, and directly sends the queried domain name information to the output module if the domain name information corresponding to the domain name to be analyzed is stored in the cacheable domain name library; and if the domain name information corresponding to the domain name to be analyzed does not exist in the query cacheable domain name database, importing the acquired domain name data into the original domain name database, and calling an extraction module.
CN201810720010.0A 2018-07-03 2018-07-03 Domain name cacheability analysis method and system Active CN110677270B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810720010.0A CN110677270B (en) 2018-07-03 2018-07-03 Domain name cacheability analysis method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810720010.0A CN110677270B (en) 2018-07-03 2018-07-03 Domain name cacheability analysis method and system

Publications (2)

Publication Number Publication Date
CN110677270A true CN110677270A (en) 2020-01-10
CN110677270B CN110677270B (en) 2023-02-28

Family

ID=69065877

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810720010.0A Active CN110677270B (en) 2018-07-03 2018-07-03 Domain name cacheability analysis method and system

Country Status (1)

Country Link
CN (1) CN110677270B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111741065A (en) * 2020-05-18 2020-10-02 北京直真科技股份有限公司 Batch CDN resource cache automation device
CN114629919A (en) * 2022-03-31 2022-06-14 北京百度网讯科技有限公司 Resource acquisition method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106921713A (en) * 2015-12-25 2017-07-04 中国移动通信集团上海有限公司 A kind of resource caching method and device
US9723053B1 (en) * 2013-08-30 2017-08-01 Amazon Technologies, Inc. Pre-fetching a cacheable network resource based on a time-to-live value
CN107153663A (en) * 2016-03-04 2017-09-12 中国移动通信集团北京有限公司 A kind of domain name resources caching method and device
CN107819837A (en) * 2017-10-31 2018-03-20 南京优速网络科技有限公司 A kind of method and log cache analysis system for lifting buffer service quality

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9723053B1 (en) * 2013-08-30 2017-08-01 Amazon Technologies, Inc. Pre-fetching a cacheable network resource based on a time-to-live value
CN106921713A (en) * 2015-12-25 2017-07-04 中国移动通信集团上海有限公司 A kind of resource caching method and device
CN107153663A (en) * 2016-03-04 2017-09-12 中国移动通信集团北京有限公司 A kind of domain name resources caching method and device
CN107819837A (en) * 2017-10-31 2018-03-20 南京优速网络科技有限公司 A kind of method and log cache analysis system for lifting buffer service quality

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李凯等: "Cache自主运营的自动化手段建设", 《电信技术》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111741065A (en) * 2020-05-18 2020-10-02 北京直真科技股份有限公司 Batch CDN resource cache automation device
CN111741065B (en) * 2020-05-18 2022-03-08 北京直真科技股份有限公司 Batch CDN resource cache automation device
CN114629919A (en) * 2022-03-31 2022-06-14 北京百度网讯科技有限公司 Resource acquisition method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN110677270B (en) 2023-02-28

Similar Documents

Publication Publication Date Title
US7093012B2 (en) System and method for enhancing crawling by extracting requests for webpages in an information flow
US9081861B2 (en) Uniform resource locator canonicalization
US7996397B2 (en) Using network traffic logs for search enhancement
AU2001290363A1 (en) A method for searching and analysing information in data networks
CN108055302B (en) Picture caching processing method and system and server
US8041893B1 (en) System and method for managing large filesystem-based caches
US20120284270A1 (en) Method and device to detect similar documents
CN110430188B (en) Rapid URL filtering method and device
KR100509276B1 (en) Method for searching web page on popularity of visiting web pages and apparatus thereof
CA2369613A1 (en) Selecting a cache
US20130185429A1 (en) Processing Store Visiting Data
CN102752288A (en) Method and device for identifying network access action
CN111831699B (en) Data caching method, electronic equipment and computer readable medium
CN111368227B (en) URL processing method and device
CN106649313B (en) Method and apparatus for processing cache data
CN110677270B (en) Domain name cacheability analysis method and system
Wills et al. Studying the impact of more complete server information on web caching
US9973950B2 (en) Technique for data traffic analysis
Langhnoja et al. Web usage mining to discover visitor group with common behavior using DBSCAN clustering algorithm
Feng et al. An efficient caching mechanism for network-based url filtering by multi-level counting bloom filters
JP3664906B2 (en) Information source observation apparatus, information source observation method, and recording medium storing a program for executing information source observation processing
US20040205049A1 (en) Methods and apparatus for user-centered web crawling
CN109408479A (en) Daily record data adding method, system, computer equipment and storage medium
JP2003271494A (en) Information collection system, information collection method, information collection program and recording medium
JP2003173351A (en) Method, device, program and storage medium for analysis, collection and retrieval of information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant