CN113852643B

CN113852643B - Content distribution network cache pollution defense method based on content popularity

Info

Publication number: CN113852643B
Application number: CN202111227105.7A
Authority: CN
Inventors: 朱笑岩; 樊甜甜; 韩雪雪; 冯鹏斌; 马建峰
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-10-21
Filing date: 2021-10-21
Publication date: 2023-11-14
Anticipated expiration: 2041-10-21
Also published as: CN113852643A

Abstract

The invention discloses a content distribution network cache pollution defense method based on content popularity, which comprises the following implementation steps: 1. calculating popularity of all cache resource contents; 2. calculating the hash values of the source stations of all the cache resources; 3. determining hit cache resources; 4. calculating a hash value of the hit cache resource; 5. judging whether the hash value of the hit cache resource is equal to the hash value of the source station stored by the cache server, if so, executing the step 6, otherwise, executing the step 7; 6. determining the cache resource as benign resource and then returning the benign resource to the user; 7. updating the malignant resource. The invention can detect all polluted resources, can also return benign resources of the original cache to the user, and can return the malignant resources to the user after updating the malignant resources by the latest resources of the source server, so that the invention ensures the correctness of the user for accessing the resources.

Description

Content distribution network cache pollution defense method based on content popularity

Technical Field

The invention belongs to the technical field of communication, and further relates to a content distribution network cache pollution prevention method based on content popularity in the technical field of network communication. The invention can be used for detecting the cache attacked by pollution in the content distribution network and clearing the cached malignant resources.

Background

In order to meet the requirements of quick communication of modern Internet, a content delivery network (Content Delivery Network, CDN) caches website content on the network 'edge' closest to users, namely a cache server, so that users can obtain required content nearby, the problems of small network bandwidth, large user access amount and uneven website distribution are technically solved, and the response speed of the users for accessing websites is greatly improved. Thus, network resources cached on the content distribution network are the target for an attacker to break network security. Common cache pollution attacks, such as cache poisoning, cause a cache server to return a user's harmful files or deny service through the cache resources of the cache server on the replaced content distribution network. The existing cache pollution attack defending mode is only designed for a certain attack, and is limited to a specific attack mode, so that the cache pollution attack of a variety cannot be dealt with.

The Hangzhou Seaman science and technology company discloses an attack defense method based on a content distribution network in a patent document "an attack defense method based on a content distribution network" (application number: 202110178012.3 application publication number: CN 113037716A) applied by the Hangzhou Seaman science and technology company. The method comprises the following steps: (1) setting an edge node in a content distribution network; (2) Determining the number of high-protection groups according to the number of links, and establishing high-protection cluster groups; (3) After the domain name is resolved to the content distribution network, setting a threshold value of the request number and the bandwidth in each edge node and link of the content distribution network, and performing exception handling when at least one of the request number and the traffic of a certain edge node exceeds the threshold value; (4) And after the high-security clusters are cut in, respectively monitoring the request number and the flow of each high-security IP of the high-security clusters, and simultaneously monitoring the request number and the flow of the affected edge nodes, links and IPs in the content distribution network. The method has the defect that the number of requests or the traffic after exceeding the threshold value of the edge node of the content distribution network is processed, and the requests and the traffic which are lower than the threshold value are not analyzed and detected.

The content distribution network security detection method and system (application number: 201710882559.5 application publication number: CN 109561051A) of the applied patent document "content distribution network security detection method and system" of the applied communication stock limited company are disclosed. The method comprises the following steps: (1) Acquiring network flow data copied by a content distribution network node to obtain whole network flow data; (2) Carrying out security analysis on the acquired total network traffic data according to a preset intrusion detection rule, and judging whether malicious resources exist or not; and (3) determining whether a safety alarm exists according to the analysis result. The method has the defects that only the malicious resource is judged and pre-warned, and the malicious resource is not processed at all, so that the user can not acquire the required webpage resource when accessing the malicious resource by mistake.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides a content distribution network cache pollution defense method based on content popularity, which is used for solving the problems that the prior art only relies on setting a threshold to detect partial malicious resources to cause the risk of missed detection, and the user cannot acquire required webpage resources when accessing the malicious resources by mistake and cannot respond to the user request correctly because the malicious resources are not processed.

The idea of the invention for achieving the above purpose is: according to the cache server, the content popularity of all resources hit the cache by a user request is calculated, the source hash values of all the resources are stored according to the ordering of the content popularity, the hit cache resources are judged by responding to the user source information values, and then the hash values hit the cache resources are compared with the stored source hash values to obtain all the polluted resources. According to the attribute of the hit cache resource, the benign resource of the original cache is returned to the user or the malignant resource is returned to the user after being updated by the latest resource of the source server.

The technical scheme of the invention comprises the following steps:

step 1, calculating popularity of all cache resource contents:

the content popularity of each resource and the website to which each resource in the cache server belongs in each time period is calculated according to the following formula:

wherein P (i, j) represents the content popularity of the ith resource and the jth website which belong to the ith resource in the cache server, ω represents the coefficient of the set content popularity P (i, j), and when the value is the calculated content popularity,the weight of P (i, j) is 0,0.5]Constant taken from range, N _i Indicating the number of times the ith resource of the cache server is requested by the user in the T-th time period, N indicating the total number of all the requested resources in the T-th time period of the cache server, k indicating the serial numbers of all the requested resources in the T-th time period of the cache server, N _k The method comprises the steps of representing the number of times that a kth resource is requested by a user in a T-th time period, sigma represents summation operation, j represents the serial number of a website to which the ith resource belongs, and m represents the total number of websites to which all the resources belong, requested by the user in the T-th time period, of a cache server;

step 2, calculating the hash values of the source stations of all the cache resources:

sequencing all the content popularity from big to small, sequentially calculating source station hash values of cache resources corresponding to each content popularity, and storing each source station hash value in a cache server in a key value pair set mode;

step 3, determining hit cache resources:

determining response resources with the field value of 'X-Cache' in the resource header information of the response user being 'HIT' as Cache resources hitting the Cache server;

step 4, calculating a hash value of the hit cache resource;

step 5, judging whether the hash value of the hit cache resource is equal to the hash value of the source station stored by the cache server, if so, executing step 6, otherwise, executing step 7;

step 6, after the buffer resource is determined to be benign resource, returning the benign resource to the user;

step 7, updating malignant resources:

and determining the cache resource as a malignant resource, and returning the malignant resource updated by the latest resource of the source server to the user.

Compared with the prior art, the invention has the following advantages:

firstly, the invention calculates and stores the source station hash values of all the resources according to the sequence of the content popularity of all the cache resources, judges the hit cache resources according to the source information value of the responding user resources, compares the hash value of the hit cache resources with the stored source station hash value, detects all the polluted resources, and overcomes the problem that the prior art has the risk of missing detection, so that the invention has the advantage of detecting all malicious resources of the cache server.

Secondly, by judging the attribute of the hit cache resource, the benign resource of the original cache is returned to the user or the malignant resource is returned to the user after being updated by the latest resource of the source server, so that the problems that the malicious resource is only judged and early-warned, the malignant resource is not processed and the user request cannot be responded correctly in the prior art are overcome, and the method and the device have the advantage of guaranteeing the correctness of the user access resource.

Drawings

FIG. 1 is a flow chart of the present invention;

fig. 2 is a flow chart of an embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the drawings and examples.

The specific steps of an implementation of the present invention are further described with reference to fig. 1.

And step 1, calculating popularity of all cache resource contents.

wherein P (i, j) represents the content popularity of the ith resource and the jth website which belong to the ith resource in the cache server, ω represents the coefficient of the set content popularity P (i, j), and when the value is the calculated content popularity,the weight of P (i, j) is 0,0.5]Constant taken from range, N _i Indicating the number of times the ith resource of the cache server is requested by the user in the T-th time period, N indicating the total number of all the requested resources in the T-th time period of the cache server, k indicating the serial numbers of all the requested resources in the T-th time period of the cache server, N _k The number of times that the kth resource is requested by the user in the T-th time period is represented, sigma represents summation operation, j represents the serial number of the website to which the ith resource belongs, and m represents the total number of websites to which the user requests all the resources in the T-th time period by the cache server.

The serial number of the website to which the resource belongs refers to collecting data x according to the global comprehensive ranking sequence of Alexa website traffic ₁ ,m],…，[x _j ,m-j+1],…,[x _m ,1]Wherein x is _j Indicating the network station name to which the i-th resource belongs.

And 2, calculating hash values of source stations of all the cache resources.

And sequencing all the content popularity from large to small, sequentially calculating source station hash values of cache resources corresponding to each content popularity, and storing each source station hash value in a cache server in a key value pair set mode.

The key value pair set is in the form of [ N ] ₁ ,H ₁ ]，…，[N _t ,H _t ]，…，[N _n ,H _n ]Wherein N is _t The resource name corresponding to the t-th resource in the cache server is represented, and the value range of t is [1, n ]]，H _t And the hash value of the source station of the resource corresponding to the t-th resource name in the cache server is represented.

And step 3, determining hit cache resources.

And determining the response resource with the field value of 'X-Cache' in the resource header information of the response user being 'HIT' as the Cache resource hitting the Cache server.

And 4, calculating a hash value of the hit cache resource.

And 5, judging whether the hash value of the hit cache resource is equal to the hash value of the source station stored by the cache server, if so, executing the step 6, otherwise, executing the step 7.

And step 6, determining the cache resource as a benign resource and returning the benign resource to the user.

And 7, updating the malignant resource.

The specific steps of an embodiment of the present invention are further described with reference to fig. 2.

The method comprises the steps of firstly, calculating content popularity of each cache resource required by a user and stored in a user edge cache server;

calculating the source hash value of each cache resource required by the user and stored in the user edge cache server, and storing the calculated source hash values of the cache resources according to the sequence of all content popularity from big to small;

thirdly, the user sends URL (Uniform Resource Locato) request to the cache server;

fourth, the cache server determines whether the user URL request hits the cache resource, if the user URL request contains a random number, in this embodiment of the present invention, the user URL request (url=test.jnumber=math.range ()) indicates that the resource is directly requesting the server for the latest resource, there is no possibility of being contaminated, and no subsequent processing is required. Otherwise, the Cache server inquires the field value of 'X-Cache' in the response header information according to the response resource returned to the user, wherein the field value is 'MISS', which indicates that the response resource does not hit the Cache resource of the Cache server, and the response resource is returned from the source server without subsequent processing; if the field value is 'HIT', judging that the response resource HITs the cache resource;

fifthly, calculating a hash value of a hit cache resource, wherein the hash value of the hit cache resource (test.jsp) is calculated as (MD 5:99B05058C3848023AD83760A61DB9FF 25);

a sixth step of comparing the hash value hit in the cache resource with the stored hash value of the source station, if the two values are equal, executing a seventh step, otherwise executing an eighth step;

seventhly, directly returning benign resources to the user;

and eighth step, updating the malignant resource, forwarding the user request to the source server by the cache server to obtain the latest resource, updating the malignant resource by the latest resource, and returning to the user.

Claims

1. A content distribution network cache pollution prevention method based on content popularity is characterized in that a cache server calculates content popularity of cached resources, calculates and stores source hash values of all the resources according to the sequence of the cached resources, compares the hash value of the hit cache resource with the stored source hash value after judging the hit cache resource, and responds to a user request after determining the attribute of the hit cache resource; the steps of the defending method comprise the following steps:

step 1, calculating popularity of all cache resource contents:

wherein P (i, j) represents the content popularity of the ith resource and the jth website which belong to the ith resource in the cache server, ω represents the coefficient of the set content popularity P (i, j), and when the value is the calculated content popularity,the weight of P (i, j) is 0,0.5]Constant taken from range, N _i Indicating the number of times the ith resource of the cache server is requested by the user in the T-th time period, N indicating the total number of all the requested resources in the T-th time period of the cache server, k indicating the serial numbers of all the requested resources in the T-th time period of the cache server, N _k The method comprises the steps of representing the number of times that the kth resource is requested by a user in the T-th time period, sigma represents summation operation, j represents the serial number of the network station name to which the ith resource belongs, and m represents the total number of the network stations to which all the resources are requested by the user in the T-th time period by a cache server;

step 3, determining hit cache resources:

step 4, calculating a hash value of the hit cache resource;

step 7, updating malignant resources:

2. The content distribution network cache pollution prevention method based on content popularity of claim 1, wherein the serial number of the website to which the resource belongs in step 1 refers to collecting data according to the global comprehensive ranking order of Alexa website traffic[x ₁ ,m],…，[x _j ,m-j+1],…,[x _m ,1]Wherein x is _j Indicating the network station name to which the i-th resource belongs.

3. The content popularity-based content delivery network cache pollution prevention method of claim 1, wherein the set of key-value pairs in step 2 is in the form of [ N ₁ ,H ₁ ]，…，[N _t ,H _t ]，…，[N _n ,H _n ]Wherein N is _t The resource name corresponding to the t-th resource in the cache server is represented, and the value range of t is [1, n ]]，H _t And the hash value of the source station of the resource corresponding to the t-th resource name in the cache server is represented.