CN114500529A - A cloud-edge collaborative caching method and system based on perceived redundancy - Google Patents

A cloud-edge collaborative caching method and system based on perceived redundancy Download PDF

Info

Publication number
CN114500529A
CN114500529A CN202111631200.3A CN202111631200A CN114500529A CN 114500529 A CN114500529 A CN 114500529A CN 202111631200 A CN202111631200 A CN 202111631200A CN 114500529 A CN114500529 A CN 114500529A
Authority
CN
China
Prior art keywords
data
edge
edge server
cache
caching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111631200.3A
Other languages
Chinese (zh)
Inventor
王艳广
李一泠
王冲
李龙鸣
符传杰
施展
王夏菁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aerospace Science And Technology Network Information Development Co ltd
Original Assignee
Aerospace Science And Technology Network Information Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aerospace Science And Technology Network Information Development Co ltd filed Critical Aerospace Science And Technology Network Information Development Co ltd
Priority to CN202111631200.3A priority Critical patent/CN114500529A/en
Publication of CN114500529A publication Critical patent/CN114500529A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开一种基于可感知冗余的云边协同缓存方法及系统,涉及云边协同缓存技术领域,以解决现有的云边协同缓存方法命中率低、访问延迟长的问题。所述云边协同缓存方法在每一周期内,执行以下操作:基于上一周期的数据访问信息,利用基于可感知冗余的协作式边缘数据缓存策略将数据集合中的数据选择性缓存于多个边缘服务器。在接收到数据访问请求后,利用基于可感知冗余的替换策略对各个边缘服务器的缓存空间进行管理。通过设计基于可感知冗余的协作式边缘数据缓存策略和基于可感知冗余的替换策略,以权衡访问边缘服务器和访问云服务器获取缺失数据产生的额外开销,提高边缘服务器空间利用率,提高系统命中率,并降低数据访问延迟。

Figure 202111631200

The invention discloses a cloud-edge collaborative caching method and system based on perceptible redundancy, and relates to the technical field of cloud-edge collaborative caching to solve the problems of low hit rate and long access delay in the existing cloud-edge collaborative caching method. The cloud-edge collaborative caching method performs the following operations in each cycle: based on the data access information of the previous cycle, using a cooperative edge data caching strategy based on perceived redundancy to selectively cache the data in the data set in multiple locations; edge server. After receiving a data access request, the cache space of each edge server is managed using a replacement strategy based on perceived redundancy. By designing a cooperative edge data caching strategy based on perceived redundancy and a replacement strategy based on perceived redundancy, we can balance the extra overhead of accessing edge servers and accessing cloud servers to obtain missing data, improve the space utilization of edge servers, and improve the system. hit rate and reduce data access latency.

Figure 202111631200

Description

一种基于可感知冗余的云边协同缓存方法及系统A cloud-edge collaborative caching method and system based on perceived redundancy

技术领域technical field

本发明涉及云边协同缓存技术领域,尤其涉及一种基于可感知冗余的云边协同缓存方法 及系统。The present invention relates to the technical field of cloud-edge collaborative caching, in particular to a cloud-edge collaborative caching method and system based on perceived redundancy.

背景技术Background technique

在传统集中式云架构下,多个用户访问相同内容时,云数据中心需要给每个用户发送相 同的数据,导致网络内存在海量重复数据,进而产生网络拥塞。一个很自然的想法是在靠近 用户的边缘侧启用缓存,使得对于流行数据的访问请求尽可能由边缘端处理,从而减轻云服 务器的带宽压力。为了弥补云架构的不足,边缘计算的概念应运而生。在边缘计算模式下, 可以将计算和存储能力下移到靠近用户的边缘服务器,使得网络边缘也有了数据处理和数据 缓存的能力。边缘缓存技术着眼于使用边缘服务器缓存云数据中心的热门内容,从而缓解云 服务器的带宽压力,降低数据访问延迟。Under the traditional centralized cloud architecture, when multiple users access the same content, the cloud data center needs to send the same data to each user, resulting in massive duplicate data in the network, resulting in network congestion. A natural idea is to enable caching on the edge side close to users, so that access requests for popular data are handled by the edge side as much as possible, thereby reducing the bandwidth pressure on cloud servers. In order to make up for the insufficiency of cloud architecture, the concept of edge computing came into being. In the edge computing mode, computing and storage capabilities can be moved down to edge servers close to users, so that the network edge also has data processing and data caching capabilities. Edge caching technology focuses on using edge servers to cache popular content in cloud data centers, thereby relieving the bandwidth pressure of cloud servers and reducing data access latency.

对边缘缓存的管理需要考虑缓存放置策略和缓存替换策略两方面。缓存放置策略的主要 目标在于平衡访问边缘协作域与访问云服务器。与边缘服务器间的数据传输开销相比,对云 服务器的访问会带来更大的数据传输开销。为了最大程度减轻云服务器的负载,通常需要对 协作缓存域内的边缘服务器进行数据去重操作。然而,数据去重可以最大程度的减少对云服 务器的访问,但也会使得大量数据访问请求无法由本地边缘服务器响应。考虑到边缘服务器 带宽资源有限,边缘服务器间的数据传输开销不可忽略,流行数据在边缘服务器间的频繁传 输,会产生较大的协作开销,严重影响整体性能。极端的去重会影响访问性能,边缘存储系 统需要适当的冗余。The management of edge cache needs to consider two aspects of cache placement strategy and cache replacement strategy. The main goal of the cache placement strategy is to balance access to edge collaboration domains and access to cloud servers. Compared with the data transmission overhead between edge servers, access to cloud servers will bring greater data transmission overhead. In order to reduce the load of the cloud server to the greatest extent, it is usually necessary to perform data deduplication operations on the edge servers in the cooperative cache domain. However, data deduplication can minimize the access to the cloud server, but also make a large number of data access requests unable to be responded by the local edge server. Considering the limited bandwidth resources of edge servers, the data transmission overhead between edge servers cannot be ignored, and the frequent transmission of popular data between edge servers will generate a large collaboration overhead and seriously affect the overall performance. Extreme deduplication can affect access performance, and edge storage systems require appropriate redundancy.

对缓存替换策略的研究同样由来已久,考虑不同影响因素可以设计合适的缓存替换策略 完成缓存内容更新操作。但上述缓存替换算法仅适用于单个服务器的缓存空间管理,不适用 于协作场景下多边缘服务器的缓存空间管理。主要原因是边缘服务器独立进行缓存替换决策 时,每个边缘服务器都会倾向于保留最流行的内容,流行数据会被缓存于几乎所有边缘服务 器。在这种情况下,边缘端空间利用率大幅降低,系统命中率下降,数据访问延迟升高。The research on the cache replacement strategy also has a long history. Considering different influencing factors, an appropriate cache replacement strategy can be designed to complete the cache content update operation. However, the above cache replacement algorithm is only applicable to the cache space management of a single server, and is not applicable to the cache space management of multi-edge servers in collaborative scenarios. The main reason is that when edge servers make cache replacement decisions independently, each edge server tends to keep the most popular content, and popular data is cached on almost all edge servers. In this case, the space utilization at the edge is greatly reduced, the system hit rate is reduced, and the data access delay is increased.

因此,亟需一种能够有效降低执行时间并提高缓存命中率的方法及系统。Therefore, there is an urgent need for a method and system that can effectively reduce the execution time and improve the cache hit rate.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种基于可感知冗余的云边协同缓存方法及系统,设计基于可感 知冗余的协作式边缘数据缓存策略和基于可感知冗余的替换策略,以权衡访问边缘服务器和 访问云服务器获取缺失数据产生的额外开销,提高边缘服务器空间利用率,提高系统命中率, 并降低数据访问延迟。The purpose of the present invention is to provide a cloud-edge collaborative caching method and system based on perceptible redundancy, and design a cooperative edge data caching strategy based on perceptible redundancy and a replacement strategy based on perceptible redundancy to balance access to edge servers. And the additional overhead of accessing cloud servers to obtain missing data, improve the space utilization of edge servers, improve the system hit rate, and reduce data access latency.

为了实现上述目的,本发明提供如下技术方案:In order to achieve the above object, the present invention provides the following technical solutions:

一种基于可感知冗余的云边协同缓存方法,所述云边协同缓存方法包括:A cloud-edge collaborative caching method based on perceived redundancy, the cloud-edge collaborative caching method comprising:

在每一周期内,执行以下操作:During each cycle, do the following:

基于上一周期的数据访问信息,利用基于可感知冗余的协作式边缘数据缓存策略将数据 集合中的数据选择性缓存于多个边缘服务器;所有所述边缘服务器构成协作缓存域;Based on the data access information of the previous cycle, the data in the data set is selectively cached in a plurality of edge servers using a cooperative edge data caching strategy based on perceived redundancy; all the edge servers constitute a cooperative cache domain;

在接收到数据访问请求后,利用基于可感知冗余的替换策略对各个所述边缘服务器的缓 存空间进行管理。After the data access request is received, the cache space of each of the edge servers is managed using a replacement strategy based on perceived redundancy.

与现有技术相比,本发明提供的一种基于可感知冗余的云边协同缓存方法,在每一周期 内,执行以下操作:基于上一周期的数据访问信息,利用基于可感知冗余的协作式边缘数据 缓存策略将数据集合中的数据选择性缓存于多个边缘服务器;所有所述边缘服务器构成协作 缓存域,在接收到数据访问请求后,利用基于可感知冗余的替换策略对各个所述边缘服务器 的缓存空间进行管理。通过设计基于可感知冗余的协作式边缘数据缓存策略和基于可感知冗 余的替换策略,以权衡访问边缘服务器和访问云服务器获取缺失数据产生的额外开销,提高 边缘服务器空间利用率,提高系统命中率,并降低数据访问延迟。Compared with the prior art, the present invention provides a cloud-edge collaborative caching method based on perceptible redundancy. In each cycle, the following operations are performed: The cooperative edge data caching strategy of the data set selectively caches the data in the data set in multiple edge servers; all the edge servers constitute a cooperative caching domain, and after receiving a data access request, the replacement strategy based on perceived redundancy is used to The cache space of each of the edge servers is managed. By designing a cooperative edge data caching strategy based on perceived redundancy and a replacement strategy based on perceived redundancy, we can balance the extra overhead of accessing edge servers and accessing cloud servers to obtain missing data, improve the space utilization of edge servers, and improve the system. hit rate and reduce data access latency.

一种基于可感知冗余的云边协同缓存系统,所述云边协同缓存系统包括云数据中心、边 缘服务器集群和控制器;所述云数据中心和所述边缘服务器集群均与所述控制器通信连接; 所述边缘服务器集群为由多个边缘服务器组成的协作缓存域;每一所述边缘服务器的缓存空 间均被分为冗余区域和独占区域;A cloud-edge collaborative caching system based on perceived redundancy, the cloud-edge collaborative caching system includes a cloud data center, an edge server cluster, and a controller; the cloud data center and the edge server cluster are both connected to the controller. communication connection; the edge server cluster is a cooperative cache domain composed of multiple edge servers; the cache space of each edge server is divided into a redundant area and an exclusive area;

所述控制器用于执行上述的云边协同缓存方法。The controller is configured to execute the above-mentioned cloud-edge collaborative caching method.

与现有技术相比,本发明提供的云边协同缓存系统的有益效果与上述技术方案所述云边 协同缓存方法的有益效果相同,此处不做赘述。Compared with the prior art, the beneficial effects of the cloud-edge collaborative caching system provided by the present invention are the same as the beneficial effects of the cloud-edge collaborative caching method described in the above technical solutions, which will not be repeated here.

附图说明Description of drawings

此处所说明的附图用来提供对本发明的进一步理解,构成本发明的一部分,本发明的示 意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The accompanying drawings described herein are used to provide a further understanding of the present invention and constitute a part of the present invention. The exemplary embodiments of the present invention and their descriptions are used to explain the present invention, and do not constitute an improper limitation of the present invention. In the attached image:

图1为本发明实施例1所提供的云边协同缓存系统的结构示意图;1 is a schematic structural diagram of a cloud-edge collaborative caching system provided by Embodiment 1 of the present invention;

图2为本发明实施例1所提供的边缘服务器缓存空间的划分示意图;FIG. 2 is a schematic diagram of the division of the edge server cache space provided by Embodiment 1 of the present invention;

图3为本发明实施例1所提供的基于可感知冗余的替换策略的流程示意图;3 is a schematic flowchart of a perceptible redundancy-based replacement strategy provided in Embodiment 1 of the present invention;

图4为本发明实施例1所提供的实验所用数据的流行度分布图;Fig. 4 is the popularity distribution diagram of the data used in the experiment provided by the embodiment of the present invention 1;

图5为本发明实施例1所提供的两种负载下三种缓存放置策略的性能表现示意图;5 is a schematic diagram of performance performance of three cache placement strategies under two loads provided in Embodiment 1 of the present invention;

图6为本发明实施例1所提供的两种负载下三种缓存替换策略的性能表现示意图;6 is a schematic diagram of performance performance of three cache replacement strategies under two loads provided in Embodiment 1 of the present invention;

图7为本发明实施例2所提供的云边协同缓存方法的流程示意图。FIG. 7 is a schematic flowchart of a cloud-edge collaborative caching method provided by Embodiment 2 of the present invention.

具体实施方式Detailed ways

为了便于清楚描述本发明实施例的技术方案,在本发明的实施例中,“示例性的”或者 “例如”等词用于表示作例子、例证或说明。本发明中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而 言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。In order to clearly describe the technical solutions of the embodiments of the present invention, in the embodiments of the present invention, words such as "exemplary" or "for example" are used to represent examples, illustrations, or illustrations. Any embodiment or design described herein as "exemplary" or "such as" should not be construed as preferred or advantageous over other embodiments or designs. Rather, use of words such as "exemplary" or "such as" is intended to present the related concepts in a specific manner.

实施例1:Embodiment 1:

对于基础设施供应商而言,边缘缓存技术能提高边缘端存储资源利用率,带来更高的经 济效益;对于应用提供商而言,边缘缓存技术可辅助其在边缘端放置应用数据,在满足既定 服务质量要求的前提下最小化存储成本;对于用户而言,边缘缓存技术能在数据量指数级增 长的背景下,优化用户的数据访问体验。有鉴于此,边缘缓存技术日益成为研究人员关注的 热点。边缘缓存技术通常用于辅助内容分发网络(Content Del iveryNetwork,CDN)进行内 容缓存,比起CDN服务器,边缘服务器更靠近用户设备,且部署密度远超过CDN服务器,故 在边缘端启用缓存可以满足海量设备的低延迟数据访问需求。但边缘服务器存储资源远不如 CDN服务器,需要进行细粒度的缓存空间管理以满足海量用户的低延迟数据访问需求。For infrastructure providers, edge caching technology can improve the utilization of edge storage resources and bring higher economic benefits; for application providers, edge caching technology can help them place application data at the edge. Minimize storage costs under the premise of given service quality requirements; for users, edge caching technology can optimize users' data access experience in the context of exponential growth in data volume. In view of this, edge caching technology has increasingly become the focus of researchers' attention. Edge caching technology is usually used to assist Content Delivery Network (CDN) for content caching. Compared with CDN servers, edge servers are closer to user equipment, and the deployment density is much higher than that of CDN servers. Therefore, enabling caching at the edge can meet the needs of massive The low-latency data access needs of the device. However, the storage resources of edge servers are far less than that of CDN servers, and fine-grained cache space management is required to meet the low-latency data access requirements of a large number of users.

对边缘缓存的管理需要考虑缓存放置策略和缓存替换策略两方面。The management of edge cache needs to consider two aspects of cache placement strategy and cache replacement strategy.

考虑到参与数据去重的节点越多,边缘端存储空间利用率越高,但是远距离边缘服务器 之间的协作会产生较大的网络开销,反而增大数据访问延迟。有的研究者提出了边缘数据去 重(Edge facilitated deduplication,EF-dedup)策略,将若干边缘服务器划分为不相交的 环,对于每个环进行分布式去重,并设置环内最佳节点数用来权衡存储开销和网络开销;有 的研究者考虑到冷热数据流行度差异巨大,将基于流行度的热数据缓存和基于空间利用率最 大化的数据去重结合起来,最大程度上减少云服务器访问频率。虽然数据去重可以最大程度 的减少对云服务器的访问,但也会使得大量数据访问请求无法由本地边缘服务器响应。考虑 到边缘服务器带宽资源有限,边缘服务器间的数据传输开销不可忽略,流行数据在边缘服务 器间的频繁传输,会产生较大的协作开销,严重影响整体性能。Considering that the more nodes participating in data deduplication, the higher the storage space utilization at the edge end, but the cooperation between the long-distance edge servers will generate a large network overhead, which will increase the data access delay. Some researchers have proposed an edge data deduplication (EF-dedup) strategy, which divides several edge servers into disjoint rings, performs distributed deduplication for each ring, and sets the optimal number of nodes in the ring. It is used to weigh storage overhead and network overhead; some researchers consider the huge difference in popularity of hot and cold data, and combine popularity-based hot data caching with data deduplication based on maximizing space utilization to minimize cloud storage. Frequency of server access. Although data deduplication can minimize the access to the cloud server, it will also make a large number of data access requests unable to be responded by the local edge server. Considering the limited bandwidth resources of edge servers, the data transmission overhead between edge servers cannot be ignored. The frequent transmission of popular data between edge servers will result in a large collaboration overhead, which will seriously affect the overall performance.

经典缓存替换策略如最近最少使用(Least Recently Used,LRU)和最不经常使用(Least Frequently Used,LFU),分别通过考虑数据访问时间间隔、数据访问频率等因素更新缓存内 容;ARC通过引入ghost缓存来综合考虑数据访问时间间隔和数据访问频率,从而进行缓存 内容更新;GDS(Greedy Dual Size)引入了数据获取成本这一因素进行缓存内容更新;GDS-LF 在GDS的基础上扩展了GDS的优先级计算函数,实现了多成本感知的目的;Qaca考虑IO请 求到达率的影响,在共享缓存的场景下为多个用户实现公平性保障;MPC将数据按照数据流 行度进行排序,新数据到达时淘汰流行度最低的数据;又考虑到新内容添加率高,而大部分 内容仅接收少数几次访问请求,这类临时内容的添加会导致缓存空间的污染,故有研究人员 提出基于数据访问间隔预测短期流行度的缓存替换策略,来减少缓存更新频率。但现有缓存 替换算法仅适用于单个服务器的缓存空间管理,不适用于协作场景下多边缘服务器的缓存空 间管理。主要原因是边缘服务器独立进行缓存替换决策时,每个边缘服务器都会倾向于保留 最流行的内容,流行数据会被缓存于几乎所有边缘服务器。在这种情况下,边缘端空间利用 率大幅降低,系统命中率下降,数据访问延迟升高。Classic cache replacement strategies, such as Least Recently Used (LRU) and Least Frequently Used (LFU), update cache content by considering data access interval, data access frequency and other factors respectively; ARC introduces ghost cache by introducing To comprehensively consider the data access time interval and data access frequency, so as to update the cache content; GDS (Greedy Dual Size) introduces the factor of data acquisition cost to update the cache content; GDS-LF expands the priority of GDS on the basis of GDS The high-level computing function realizes the purpose of multi-cost perception; Qaca considers the impact of IO request arrival rate, and realizes fairness guarantee for multiple users in the scenario of shared cache; MPC sorts data according to data popularity, and when new data arrives Eliminate the data with the lowest popularity; considering that the rate of adding new content is high, and most of the content only receives a few access requests, the addition of such temporary content will lead to the pollution of the cache space, so some researchers propose that based on the data access interval A cache replacement strategy that predicts short-term popularity to reduce cache update frequency. However, the existing cache replacement algorithm is only suitable for the cache space management of a single server, and is not suitable for the cache space management of multiple edge servers in a collaborative scenario. The main reason is that when edge servers make cache replacement decisions independently, each edge server tends to keep the most popular content, and popular data is cached on almost all edge servers. In this case, the space utilization at the edge is greatly reduced, the system hit rate is reduced, and the data access delay is increased.

针对现有技术的缺陷,本实施例用于提供一种基于可感知冗余的云边协同缓存系统,通 过限制冗余数据最大可用缓存空间来保障空间利用率。如图1所示,所述云边协同缓存系统 包括云数据中心、边缘服务器集群和控制器,云数据中心和边缘服务器集群均与控制器通信 连接。在边缘缓存系统中,终端设备首先向距离其最近的本地边缘服务器请求数据,边缘服 务器为其覆盖范围内的终端设备提供缓存服务,缓存系统由多个彼此协作的边缘服务器和云 数据中心共同组成。每个边缘服务器均可用于流行数据缓存,多个边缘服务器组成协作缓存 域,边缘服务器之间可以共享缓存内容。若请求数据未被缓存于任一边缘服务器,则需要访 问云数据中心将缺失数据添加到边缘服务器。In view of the defects of the prior art, this embodiment is used to provide a cloud-edge collaborative caching system based on perceived redundancy, which ensures space utilization by limiting the maximum available cache space for redundant data. As shown in Figure 1, the cloud-edge collaborative caching system includes a cloud data center, an edge server cluster and a controller, and both the cloud data center and the edge server cluster are connected in communication with the controller. In an edge caching system, a terminal device first requests data from the local edge server closest to it, and the edge server provides caching services for terminal devices within its coverage. The caching system is composed of multiple edge servers and cloud data centers that cooperate with each other. . Each edge server can be used to cache popular data, multiple edge servers form a cooperative cache domain, and cache content can be shared among edge servers. If the requested data is not cached in any edge server, you need to visit the cloud data center to add the missing data to the edge server.

云数据中心是基于云计算架构的新型数据中心,云数据中心从物理上聚集了大量服务器, 是网络的逻辑中心,提供业务的源头。边缘服务器集群为由多个边缘服务器组成的协作缓存 域,每一边缘服务器的缓存空间均被分为冗余区域和独占区域。冗余区域缓存流行度最高的 若干数据,对于流行数据的访问请求可以尽可能的由本地边缘服务器直接响应,减少边缘服 务器间的数据传输频率;独占区域用于缓存其他边缘服务器未缓存的数据内容,从而保障整 体空间利用效率,且为了让各个边缘服务器的数据访问性能接近,将数据按照流行度排名均 匀的缓存到各个边缘服务器的独占区域。The cloud data center is a new type of data center based on cloud computing architecture. The cloud data center physically gathers a large number of servers, is the logical center of the network, and provides the source of services. An edge server cluster is a cooperative cache domain composed of multiple edge servers, and the cache space of each edge server is divided into a redundant area and an exclusive area. The redundant area caches some data with the highest popularity, and access requests for popular data can be directly responded by the local edge server as much as possible, reducing the frequency of data transmission between edge servers; the exclusive area is used to cache data content that is not cached by other edge servers , so as to ensure the overall space utilization efficiency, and in order to make the data access performance of each edge server similar, the data is evenly cached according to the popularity ranking in the exclusive area of each edge server.

控制器用于执行实施例2所述的云边协同缓存方法。控制器作为云数据中心和边缘服务 器协同缓存的枢纽,包括访问信息采集模块、缓存收益预测模块、边缘数据缓存模块、请求 处理模块和云服务访问模块。The controller is configured to execute the cloud-edge collaborative caching method described in Embodiment 2. The controller acts as a hub for collaborative caching between cloud data centers and edge servers, including access information collection module, caching revenue prediction module, edge data caching module, request processing module and cloud service access module.

访问信息采集模块用于统计处理数据访问请求时的相关信息,如数据对象在各个边缘服 务器内的访问次数、数据访问请求的命中情况、数据访问延迟等信息,得到数据访问信息。 数据访问过程中产生的延迟(即数据访问延迟)主要由以下三部分组成:用户访问本地边缘 服务器产生的数据下载延迟Ldown、边缘服务器间的数据传输延迟Ledge、访问云服务器产生 的数据传输延迟Lcloud,其中Ldown<Ledge<LcloudThe access information collection module is used to count relevant information when processing data access requests, such as the access times of data objects in each edge server, the hit situation of data access requests, data access delays, etc., to obtain data access information. The delay (ie data access delay) generated in the data access process is mainly composed of the following three parts: the data download delay L down caused by the user accessing the local edge server, the data transmission delay L edge between the edge servers, and the data transmission caused by accessing the cloud server. Delay L cloud , where L down < L edge < L cloud .

缓存收益预测模块用于基于上一周期的历史数据访问信息和实时统计的数据访问信息, 计算全局流行度、局部访问偏好、访问频率、缓存收益以及缓存收益阈值,以用于缓存策略 调整。The cache revenue prediction module is used to calculate the global popularity, local access preference, access frequency, cache revenue and cache revenue threshold based on the historical data access information of the previous cycle and the data access information of real-time statistics, so as to adjust the cache policy.

具体计算方法如下:The specific calculation method is as follows:

假设数据集合按照访问频率降序排列后为

Figure BDA0003441033110000051
假设第j个数据dj的出现 频率符合Zipf分布,则第j个数据的全局数据流行度Pj的计算公式如下:Suppose the data set is sorted in descending order of access frequency as
Figure BDA0003441033110000051
Assuming that the occurrence frequency of the jth data dj conforms to the Zipf distribution, the calculation formula of the global data popularity Pj of the jth data is as follows:

Figure BDA0003441033110000052
Figure BDA0003441033110000052

式(1)中,α为Zipf分布指数。In formula (1), α is the Zipf distribution index.

第j个数据对第i个边缘服务器的局部访问偏好

Figure BDA0003441033110000053
的计算公式如下:The local access preference of the jth data to the ith edge server
Figure BDA0003441033110000053
The calculation formula is as follows:

Figure BDA0003441033110000054
Figure BDA0003441033110000054

式(2)中,

Figure BDA0003441033110000055
为上一周期内,第i个边缘服务器ni的服务范围内,数据dj的累计请求量;
Figure BDA0003441033110000056
In formula (2),
Figure BDA0003441033110000055
is the cumulative request amount of data d j within the service range of the i-th edge server n i in the previous cycle;
Figure BDA0003441033110000056

边缘服务器ni内数据dj的访问频率

Figure BDA0003441033110000057
的计算公式为:Access frequency of data d j in edge server n i
Figure BDA0003441033110000057
The calculation formula is:

Figure BDA0003441033110000058
Figure BDA0003441033110000058

使用单位字节的访问概率表示数据dj被缓存在边缘服务器ni内的缓存收益

Figure BDA0003441033110000059
则 缓存收益的计算公式如下:Use the access probability of unit byte to represent the cache revenue of data d j being cached in edge server n i
Figure BDA0003441033110000059
The calculation formula of cache revenue is as follows:

Figure BDA00034410331100000510
Figure BDA00034410331100000510

式(4)中,sj为数据的数据大小。In formula (4), s j is the data size of the data.

边缘服务器ni的缓存收益阈值pti的计算公式如下:The calculation formula of the cache revenue threshold pt i of the edge server n i is as follows:

Figure BDA00034410331100000511
Figure BDA00034410331100000511

式(5)中,

Figure BDA00034410331100000512
为边缘服务器ni中缓存收益最低的数据dmin的缓存收益;Nn为边缘服务器的总个数。In formula (5),
Figure BDA00034410331100000512
is the cache revenue of the data d min with the lowest cache revenue in the edge server n i ; N n is the total number of edge servers.

若数据dj已缓存于协作缓存域内的若干节点M,且

Figure BDA00034410331100000513
则将数据dj缓存于ni的收 益大于成本的临界条件为:
Figure BDA0003441033110000061
If data dj has been cached in several nodes M in the cooperative cache domain, and
Figure BDA00034410331100000513
Then the critical condition that the benefit of caching data d j in ni is greater than the cost is:
Figure BDA0003441033110000061

边缘数据缓存模块用于利用基于可感知冗余的协作式边缘数据缓存策略进行缓存放置操 作,具体基于边缘服务器的缓存容量、网络状态、数据缓存收益等信息制定缓存放置方案, 将流行数据主动缓存至边缘服务器以优化用户侧的数据访问体验。基于可感知冗余的协作式 边缘数据缓存策略,也称为RCEDC(Redundancy-aware Cooperative EdgeData Caching), 其基于贪心策略将流行数据缓存于边缘服务器,以避免直接求解混合整形非线性规划问题所 产生的计算开销,并选择性的将部分数据缓存于多个边缘服务器,以权衡访问边缘服务器获 取缺失数据产生的额外开销以及访问云服务器获取缺失数据产生的额外开销。The edge data cache module is used to perform cache placement operations using a cooperative edge data cache strategy based on perceived redundancy. Specifically, a cache placement plan is formulated based on information such as the cache capacity, network status, and data cache revenue of the edge server, and the popular data is actively cached. to the edge server to optimize the data access experience on the user side. Redundancy-aware cooperative edge data caching strategy, also known as RCEDC (Redundancy-aware Cooperative Edge Data Caching), which caches popular data on edge servers based on greedy strategy to avoid directly solving hybrid shaping nonlinear programming problems. and selectively cache part of the data in multiple edge servers to balance the additional cost of accessing the edge server to obtain the missing data and the additional cost of accessing the cloud server to obtain the missing data.

RCEDC算法包括如下步骤:The RCEDC algorithm includes the following steps:

(1)将边缘数据缓存问题简化为线性规划问题求解初始缓存方案,决定冗余数据占比, 并确定边缘服务器中缓存收益最低的数据对象的流行度排名。(1) Simplify the edge data caching problem into a linear programming problem to solve the initial caching scheme, determine the proportion of redundant data, and determine the popularity ranking of the data objects with the lowest caching benefit in the edge server.

Figure BDA0003441033110000062
表示边缘服务器,
Figure BDA0003441033110000063
表示各个边缘服务器的缓存容量 (Cache Capacity),其中,Nn表示边缘服务器的总个数。
Figure BDA0003441033110000064
表示数据访问过 程中涉及的数据对象(Data),
Figure BDA0003441033110000065
表示数据大小(Size)。use
Figure BDA0003441033110000062
represents an edge server,
Figure BDA0003441033110000063
represents the cache capacity (Cache Capacity) of each edge server, where N n represents the total number of edge servers.
Figure BDA0003441033110000064
Represents the data objects (Data) involved in the data access process,
Figure BDA0003441033110000065
Indicates the data size (Size).

为求解适当的冗余数据占比,RCEDC将边缘数据缓存问题简化为线性规划问题:In order to solve the appropriate redundant data ratio, RCEDC reduces the edge data caching problem to a linear programming problem:

minL(Nr)minL(N r )

Figure BDA0003441033110000066
Figure BDA0003441033110000066

其中Nr为冗余区可缓存数据量,Ns为独占区可缓存数据量,L(Nr)表示为:Among them, N r is the amount of data that can be cached in the redundant area, N s is the amount of data that can be cached in the exclusive area, and L(N r ) is expressed as:

Figure BDA0003441033110000067
Figure BDA0003441033110000067

(2)使用贪心算法,将数据按照缓存收益降序排列,依次将未缓存数据添加到边缘服务 器中,缓存收益高的数据优先缓存。(2) Using the greedy algorithm, the data is arranged in descending order of cache revenue, and the uncached data is added to the edge server in turn, and the data with higher cache revenue is cached first.

(3)因数据dj在不同边缘服务器ni内的访问热度不同,即局部访问偏好

Figure RE-GDA0003564649840000068
存在差异, 按照dj的局部访问偏好对边缘服务器进行降序排列,得到有序的边缘服务器集合
Figure RE-GDA0003564649840000069
优先将数据添加到局部访问热度最高的边缘服务器即
Figure RE-GDA00035646498400000610
Figure RE-GDA00035646498400000611
剩余缓存空间大于数据对象的数据大小sj,则将dj缓存于
Figure RE-GDA00035646498400000612
若数据缓存成功,则进入步 骤(4),反之选择缓存收益次优的边缘服务器进行缓存。(3) Due to the different access heat of data d j in different edge servers ni , that is, local access preference
Figure RE-GDA0003564649840000068
There are differences, and the edge servers are sorted in descending order according to the local access preferences of d j to obtain an ordered set of edge servers
Figure RE-GDA0003564649840000069
Priority is given to adding data to the edge server with the highest local access, i.e.
Figure RE-GDA00035646498400000610
like
Figure RE-GDA00035646498400000611
If the remaining cache space is greater than the data size s j of the data object, then cache d j in
Figure RE-GDA00035646498400000612
If the data is cached successfully, go to step (4), otherwise select the edge server with the next best cache revenue for caching.

(4)遍历剩余边缘服务器

Figure BDA00034410331100000613
若数据缓存收益
Figure BDA00034410331100000614
大于缓存收益阈值ptm,且
Figure BDA0003441033110000071
剩余缓存容量大于数据对象大小sj则进行数据缓存,随后选择下一个边缘服务 器尝试进行数据缓存。(4) Traverse the remaining edge servers
Figure BDA00034410331100000613
If data caching benefits
Figure BDA00034410331100000614
is greater than the cache revenue threshold pt m , and
Figure BDA0003441033110000071
If the remaining cache capacity is greater than the data object size sj , data caching is performed, and then the next edge server is selected to attempt data caching.

请求处理模块用于利用基于可感知冗余的替换策略进行缓存替换操作,具体在数据访问 请求到达时,依次检索本地边缘服务器、协作缓存域内的相邻边缘服务器、云数据中心以获 取数据。若数据访问请求未命中,则需要将缺失数据添加到本地边缘服务器,此时若剩余缓 存空间不足,则需要淘汰已缓存数据中缓存收益最低的数据。The request processing module is used to perform a cache replacement operation using a replacement strategy based on perceived redundancy. Specifically, when a data access request arrives, it sequentially retrieves the local edge server, adjacent edge servers in the cooperative cache domain, and cloud data centers to obtain data. If the data access request is not hit, the missing data needs to be added to the local edge server. At this time, if the remaining cache space is insufficient, the data with the lowest cache benefit needs to be eliminated from the cached data.

请求处理模块采用面向分区缓存的基于可感知冗余的替换策略,称为RPCRS(Redundancy-aware Partitioned Cache Replacement Strategy),该策略为了感知数据冗 余状态,将边缘服务器内的缓存数据按获取来源分为两类:一类是由云数据中心获取的数据 (独占数据),这一类数据只缓存于单个边缘服务器;另一类是由从相邻边缘服务器获取的 数据(冗余数据),此类数据已被缓存于相邻的边缘服务器。如图2所示,RPCRS策略将每 个边缘服务器的缓存空间划分为C1、C2两个区域。其中,C1区域用于缓存相邻节点未缓存 的数据内容,且仅能缓存独占数据;而C2区域根据边缘服务器的本地访问热度缓存流行数据, 既可以缓存独占数据又可以缓存冗余数据。根据每个节点内的数据访问负载,冗余数据和独 占数据的占比可以自适应调整,从而优化数据访问性能。The request processing module adopts a partitioned cache-oriented replacement strategy based on perceived redundancy, called RPCRS (Redundancy-aware Partitioned Cache Replacement Strategy). In order to perceive the data redundancy state, the strategy divides the cached data in the edge server according to the source of acquisition. There are two types: one is the data obtained by the cloud data center (exclusive data), which is only cached in a single edge server; the other is the data obtained from adjacent edge servers (redundant data), which is Class data has been cached on adjacent edge servers. As shown in Figure 2, the RPCRS strategy divides the cache space of each edge server into two regions, C1 and C2. Among them, the C1 area is used to cache the data content that is not cached by adjacent nodes, and can only cache exclusive data; while the C2 area caches popular data according to the local access heat of the edge server, which can cache both exclusive data and redundant data. According to the data access load in each node, the proportion of redundant data and exclusive data can be adjusted adaptively to optimize data access performance.

RPCRS策略的数据添加过程具有以下流程:独占数据首次进入缓存时被添加到C1区域, C1区域缓存空间不足时,优先级最低的数据被降级至C2区域,并与冗余数据竞争缓存空间。 而独占数据位于C2区域时,若再次被命中则升级至C1区域;冗余数据首次进入缓存时被添 加到C2区域,且无论是否命中只能位于C2区域。冗余数据的最大可用缓存空间受制于C2区 域的缓存容量,但因为C2区域除了冗余数据还包括从C1区域降级下来的独占数据,两者竞 争C2区域的缓存空间。所以冗余数据实际使用的缓存空间,以C2区域的缓存容量为上界, 根据实际访问情况可以自适应调整空间占比。The data addition process of the RPCRS strategy has the following process: when the exclusive data enters the cache for the first time, it is added to the C1 area. When the cache space in the C1 area is insufficient, the data with the lowest priority is demoted to the C2 area, and competes with the redundant data for the cache space. When the exclusive data is located in the C2 area, if it is hit again, it will be upgraded to the C1 area; the redundant data is added to the C2 area when it enters the cache for the first time, and can only be located in the C2 area regardless of whether it is hit or not. The maximum available cache space for redundant data is limited by the cache capacity of the C2 area, but because the C2 area includes redundant data and exclusive data downgraded from the C1 area, the two compete for the cache space of the C2 area. Therefore, the cache space actually used by redundant data is based on the cache capacity of the C2 area as the upper bound, and the space ratio can be adjusted adaptively according to the actual access situation.

RPCRS策略计算边缘服务器中数据在C1和C2区域的优先级,其过程如下:The RPCRS strategy calculates the priority of data in the C1 and C2 regions in the edge server. The process is as follows:

计算数据dj的实时访问频率

Figure BDA0003441033110000072
Calculate the real-time access frequency of data d j
Figure BDA0003441033110000072

Figure BDA0003441033110000073
Figure BDA0003441033110000073

式(8)中,

Figure BDA0003441033110000074
表示当前周期开始时刻t0至当前时刻tcur,边缘服务器ni内数据dj的累 计访问次数;
Figure BDA0003441033110000075
则表示t0至tcur时间范围内,边缘服务器ni内所有缓存数据的累 计访问次数之和。In formula (8),
Figure BDA0003441033110000074
represents the cumulative access times of the data d j in the edge server n i from the start time t 0 of the current cycle to the current time t cur ;
Figure BDA0003441033110000075
It represents the sum of the cumulative access times of all cached data in edge server n i in the time range from t 0 to t cur .

使用指数加权移动平均fi j表示数据访问频率:Use exponentially weighted moving average f i j to represent data access frequency:

Figure BDA0003441033110000081
Figure BDA0003441033110000081

式(9)中,α表示加权因子,取值为0到1之间的小数。In formula (9), α represents a weighting factor, which is a decimal between 0 and 1.

边缘服务器ni的C1区域内数据dj的缓存优先级

Figure BDA0003441033110000082
为:Cache priority of data d j in the C1 region of edge server n i
Figure BDA0003441033110000082
for:

Figure BDA0003441033110000083
Figure BDA0003441033110000083

边缘服务器ni的C2区域内数据dj的优先级

Figure BDA0003441033110000084
为:The priority of data d j in the C2 region of edge server n i
Figure BDA0003441033110000084
for:

Figure BDA0003441033110000085
Figure BDA0003441033110000085

Weight为权重,由于驱逐独占数据具有更大的开销,故设:Weight is the weight. Since eviction of exclusive data has greater overhead, it is assumed that:

Figure BDA0003441033110000086
Figure BDA0003441033110000086

图3为RPCRS下的缓存内容更新流程图。处理数据访问请求时,依次检查本地边缘服务 器的C1区域和C2区域以及相邻边缘服务器的缓存空间是否包含请求数据。根据数据访问请 求的命中情况可以分为以下几种情况:FIG. 3 is a flowchart of cache content update under RPCRS. When processing a data access request, check whether the C1 area and C2 area of the local edge server and the cache space of the adjacent edge server contain the requested data in turn. According to the hit situation of the data access request, it can be divided into the following situations:

(1)在本地边缘服务器的C1区域命中请求数据。此时无新数据添加,只需要更新被访 问数据的缓存优先级;(1) Hit the request data in the C1 area of the local edge server. At this time, no new data is added, only the cache priority of the accessed data needs to be updated;

(2)在本地边缘服务器的C2区域命中请求数据。此时需要根据数据类型进行相应处理: 若命中的数据为独占数据,因独占数据未被缓存于相邻边缘服务器,且短时间内被再次命中, 故将数据升级至C1区域,若C1区域剩余缓存空间不足,则淘汰C1区域优先级最低的数据以 腾出足够的缓存空间;若命中数据为冗余数据,则只需要更新目标数据的优先级,避免冗余 数据抢占C1区域的缓存空间,从而起到限制冗余数据最大可用缓存空间的目的;(2) Hit the request data in the C2 area of the local edge server. At this time, it needs to be processed according to the data type: If the hit data is exclusive data, because the exclusive data is not cached in the adjacent edge server and is hit again in a short period of time, the data is upgraded to the C1 area. If the C1 area remains If the cache space is insufficient, the data with the lowest priority in the C1 area is eliminated to free up enough cache space; if the hit data is redundant data, only the priority of the target data needs to be updated to avoid redundant data preempting the cache space in the C1 area. So as to limit the maximum available cache space for redundant data;

(3)在相邻边缘服务器侧命中请求数据。此时需要访问相邻边缘服务器将缺失数据添加 到本地,并修改其数据类型为冗余数据,之后将数据添加到C2区域。若C2区域剩余缓存空 间不足,则淘汰C2区域优先级最低的数据,再进行数据缓存,并为新数据初始化其数据流行 度;(3) Hit the request data on the adjacent edge server side. At this time, it is necessary to access the adjacent edge server to add the missing data locally, and modify its data type to redundant data, and then add the data to the C2 area. If the remaining cache space in the C2 area is insufficient, the data with the lowest priority in the C2 area will be eliminated, and then the data will be cached, and its data popularity will be initialized for the new data;

(4)在协作缓存域内未能命中请求数据。此时需要访问云服务器获取缺失数据,并修改 数据类型为独占数据,之后将数据添加到C1区域。若C1区域剩余缓存空间不足以缓存新数 据,则将优先级较低的数据降级至C2区域,以腾出C1区域缓存空间。独占数据被降级时更 新其缓存优先级,并与冗余数据竞争C2区域缓存空间。虽然独占数据和冗余数据相比有更高 的优先级权重,在竞争缓存空间时占优,但是若独占数据长期未被访问,伴随着缓存内容的 更新,访问频率较低的独占数据会被及时淘汰。(4) Failed to hit the requested data in the cooperative cache domain. In this case, you need to access the cloud server to obtain the missing data, change the data type to exclusive data, and then add the data to the C1 area. If the remaining cache space in the C1 area is not enough to cache new data, the data with lower priority will be demoted to the C2 area to free up the C1 area cache space. When exclusive data is demoted, its cache priority is updated, and it competes with redundant data for C2 area cache space. Although exclusive data has a higher priority weight than redundant data, it has an advantage in competing for cache space, but if the exclusive data has not been accessed for a long time, along with the update of the cache content, the exclusive data with lower access frequency will be deleted. eliminated in time.

基于上述替换策略,冗余数据实际占用的缓存空间以C2区域缓存容量为上界,根据数据 访问负载可以动态调整实际空间占比。故而每个边缘服务器可以根据实际的访问负载自适应 调整两类数据的空间占比,进一步优化不同访问负载下的数据访问性能。Based on the above replacement strategy, the cache space actually occupied by redundant data is upper bounded by the cache capacity of the C2 area, and the actual space ratio can be dynamically adjusted according to the data access load. Therefore, each edge server can adaptively adjust the space ratio of the two types of data according to the actual access load, and further optimize the data access performance under different access loads.

云服务访问模块用于在请求数据未被缓存于协作缓存域中时,则需要调用云服务访问模 块访问云数据中心获取缺失数据,并将数据添加到本地边缘服务器。The cloud service access module is used to call the cloud service access module to access the cloud data center to obtain missing data when the requested data is not cached in the collaborative cache domain, and add the data to the local edge server.

相比现有方案,本实施例按照数据是否缓存于多个边缘服务器,将数据分为冗余数据和 独占数据,并对冗余数据的最大可用缓存空间进行限制,从而提高了空间利用率。总体而言, 本实施例为了解决当前分布式存储领域边缘存储相关的问题,以基于可感知冗余的协作式边 缘数据缓存策略RCEDC和面向分区缓存的基于可感知冗余的替换策略RPCRS为核心,建立了 一套云端-边缘协同缓存系统。Compared with the existing solution, this embodiment divides data into redundant data and exclusive data according to whether the data is cached in multiple edge servers, and limits the maximum available cache space of redundant data, thereby improving space utilization. In general, in order to solve the problems related to edge storage in the current distributed storage field, this embodiment takes the redundant-perceived cooperative edge data caching strategy RCEDC and the partition-oriented cache-oriented replacement strategy RPCRS based on perceptible redundancy as the core. , and established a cloud-edge collaborative caching system.

在过去的既有方案中,一方面,重视缓存命中率,尽可能增加冗余数据的策略会导致空 间利用率低下,边缘缓存服务器存储资源严重不足;另一方面,希望极可能提高边缘协作能 力,不设冗余缓存的策略会导致频繁访问云端,极大增加访问延迟。在实验中可以证明,本 实施例所提供的云边协同缓存系统能够适应当下的网络环境,有效降低执行时间及提高缓存 命中率。通过以下实验对本实施例的有效性加以证明:In the existing solutions in the past, on the one hand, the strategy of attaching importance to the cache hit rate and increasing redundant data as much as possible will lead to low space utilization and serious shortage of edge cache server storage resources; on the other hand, it is hoped that it is possible to improve edge collaboration capabilities , the strategy of not setting redundant cache will lead to frequent access to the cloud and greatly increase the access delay. It can be proved in experiments that the cloud-edge collaborative caching system provided by this embodiment can adapt to the current network environment, effectively reduce the execution time and improve the cache hit rate. The effectiveness of this embodiment is proved by the following experiments:

实验使用的是名为SimEdgeIntel的边缘计算模拟器。本实施例在Linux环境下进行测试 实验,服务器配置2个Intel Xeon E5-2620 CPU,128GB内存;使用到的软件及其版本号如 表1所示。The experiments used an edge computing simulator called SimEdgeIntel. In this embodiment, the test experiment is carried out in the Linux environment, and the server is configured with 2 Intel Xeon E5-2620 CPUs and 128 GB of memory; the software used and its version numbers are shown in Table 1.

表1软件版本Table 1 Software Versions

Figure BDA0003441033110000091
Figure BDA0003441033110000091

实验负载选用2016年对全美排行前十的网站的100万条匿名数据访问请求(下文称 cdn-request-18),该负载曾在多项研究工作中被使用。该负载下数据对象流行度分布如图 4所示。该负载包含约45万个被访问的数据对象,其中的数据访问请求集中于少数的流行数 据,而排名1000之后的数据,其内容流行度趋于0,即大部分数据为仅存在少数几次访问的 冷数据,符合Zipf分布数据流行度特性。使用Zipf分布拟合数据流行度分布曲线,得到Zipf 分布指数为0.74。基于拟合结果生成200万条数据访问请求作为合成负载(下文称 zipf-0.74)。用到的实验负载如表2所示:The experimental load selected 1 million anonymous data access requests to the top ten websites in the United States in 2016 (hereinafter referred to as cdn-request-18), which has been used in many research works. The popularity distribution of data objects under this load is shown in Figure 4. The load contains about 450,000 accessed data objects, among which data access requests are concentrated on a small number of popular data, while the content popularity of data ranked after 1,000 tends to 0, that is, most of the data exists only a few times The accessed cold data conforms to the popularity characteristics of Zipf distributed data. Using the Zipf distribution to fit the data popularity distribution curve, the Zipf distribution index is 0.74. Based on the fitting results, 2 million data access requests are generated as a synthetic payload (hereinafter referred to as zipf-0.74). The experimental loads used are shown in Table 2:

表2实验负载Table 2 Experimental load

Figure BDA0003441033110000092
Figure BDA0003441033110000092

Figure BDA0003441033110000101
Figure BDA0003441033110000101

实验参数如表3所示,设置边缘服务器间的数据传输延迟为20ms,云服务器访问延迟为 200ms。为了研究缓存容量对系统性能的影响,设置缓存容量为数据总量的1%~10%;为了研 究边缘服务器数量对系统性能的影响,设置边缘服务器数量为2~6;为了研究边缘网络状 态对系统性能的影响,设置边缘服务器间的数据传输延迟的变化范围为20~80ms。The experimental parameters are shown in Table 3. The data transmission delay between edge servers is set to 20ms, and the access delay of cloud server is 200ms. In order to study the impact of cache capacity on system performance, the cache capacity is set to be 1% to 10% of the total data; in order to study the impact of the number of edge servers on system performance, the number of edge servers is set to be 2 to 6; in order to study the effect of edge network status on the system performance Due to the impact of system performance, the data transmission delay between edge servers is set to vary from 20ms to 80ms.

表3实验参数表Table 3 Experimental parameter table

Figure BDA0003441033110000102
Figure BDA0003441033110000102

为了评估缓存策略性能,下面介绍性能评估的指标:In order to evaluate the cache policy performance, the following describes the performance evaluation indicators:

1、缓存命中率:用户请求命中次数和请求总数的比值,请求命中即数据访问请求可以由 本地边缘服务器或相邻边缘服务器处理;1. Cache hit rate: the ratio of the number of user request hits to the total number of requests. A request hit means that a data access request can be processed by the local edge server or an adjacent edge server;

2、平均延迟:数据总访问延迟和请求总数的比值,平均延迟是用户能直观感觉到的重要 指标,也是本发明优化的首要目标;2. Average delay: the ratio of the total data access delay to the total number of requests, the average delay is an important indicator that the user can intuitively feel, and is also the primary target of the optimization of the present invention;

3、卸载流量:边缘端处理的总数据流量。边缘缓存技术的一个重要作用是在边缘端处理 用户对于流行数据的访问请求,以解决云架构下的带宽瓶颈,故现有研究工作中多使用卸载 流量作为边缘缓存策略的评估指标;3. Offload traffic: the total data traffic processed by the edge. An important role of edge caching technology is to process users' access requests for popular data at the edge to solve the bandwidth bottleneck under the cloud architecture. Therefore, in the existing research work, offload traffic is often used as an evaluation index for edge caching strategies;

4、执行时间:考虑到本实施例提出的RCEDC策略旨在有限时间内获取缓存放置方案,故 在缓存放置策略的实验中比较各个算法的执行时间。4. Execution time: Considering that the RCEDC strategy proposed in this embodiment aims to obtain a cache placement scheme within a limited time, the execution time of each algorithm is compared in the cache placement strategy experiment.

下面使用SimEdgeIntel内置的缓存放置策略Greedy、分布式协作缓存策略(以下称 Contrast)和本文提出的基于冗余感知的协作式边缘数据缓存策略RCEDC进行对比实验。其 中,Greedy以最大化协作缓存域总缓存收益作为目标,在各个边缘服务器侧放置缓存收益最 高的数据,使边缘端缓存数据的总收益最大。对比三种缓存放置策略下的命中率、平均延迟、 卸载流量、执行时间等四个指标。设置边缘服务器缓存容量C=0.75GB,边缘服务器数量 Nn=3,边缘服务器间数据传输延迟Ledge=20ms。分别在cdn-request-18和zipf-0.74 两种负载下进行实验,实验结果如图5所示。The following uses SimEdgeIntel's built-in cache placement strategy Greedy, distributed cooperative caching strategy (hereinafter referred to as Contrast) and the redundancy-aware cooperative edge data caching strategy RCEDC proposed in this paper for comparative experiments. Among them, Greedy aims to maximize the total cache revenue of the cooperative cache domain, and places the data with the highest cache revenue on each edge server side to maximize the total revenue of the edge cached data. Compare the hit rate, average latency, offload traffic, and execution time under the three cache placement strategies. Set the edge server cache capacity C=0.75GB, the number of edge servers N n =3, and the data transmission delay between edge servers L edge =20ms. Experiments were carried out under two loads of cdn-request-18 and zipf-0.74, respectively, and the experimental results are shown in Figure 5.

如图5(a)和图5(b)所示,图5(a)为cdn-request-18负载下的命中率对比图,图5(b) 为zipf-0.74负载下的命中率对比图。在两种负载下,Contrast和RCEDC的命中率相近且均 远高于Greedy,而Contrast和RCEDC的平均命中率差值最高为0.3%。As shown in Figure 5(a) and Figure 5(b), Figure 5(a) is a comparison chart of hit ratio under cdn-request-18 load, and Figure 5(b) is a comparison chart of hit ratio under zipf-0.74 load . Contrast and RCEDC have similar hit rates and much higher than Greedy under both loads, while Contrast and RCEDC have an average hit rate difference of up to 0.3%.

如图5(c)和图5(d)所示,图5(c)为cdn-request-18负载下的卸载流量对比图,图5(d) 为zipf-0.74负载下的卸载流量对比图。在两种负载下,Contrast和RCEDC的卸载流量相近 且均远高于Greedy,而在zipf-0.74负载下因请求数量翻倍,卸载流量进一步提高。As shown in Figure 5(c) and Figure 5(d), Figure 5(c) is a comparison chart of unloaded traffic under cdn-request-18 load, and Figure 5(d) is a comparison chart of unloaded traffic under zipf-0.74 load . Under the two loads, the offload traffic of Contrast and RCEDC is similar and much higher than that of Greedy, while under the zipf-0.74 load, the offload traffic is further improved due to the doubling of the number of requests.

如图5(e)和图5(f)所示,图5(e)为cdn-request-18负载下的平均延迟对比图,图5(f) 为zipf-0.74负载下的平均延迟对比图。在cdn-request-18负载下,与Contrast和Greedy 相比,RCEDC将平均延迟分别减少了5.64%和16.00%,在zipf-0.74负载下,与Contrast和 Greedy相比,RCEDC将平均延迟分别减少了5.63%和14.43%。As shown in Figure 5(e) and Figure 5(f), Figure 5(e) is the average delay comparison chart under cdn-request-18 load, and Figure 5(f) is the average delay comparison chart under zipf-0.74 load . Under cdn-request-18 load, RCEDC reduces the average delay by 5.64% and 16.00% compared to Contrast and Greedy, respectively, and under zipf-0.74 load, RCEDC reduces the average delay compared to Contrast and Greedy, respectively 5.63% and 14.43%.

如图5(g)和图5(h)所示,图5(g)为cdn-request-18负载下的执行时间对比图,图5(h) 为zipf-0.74负载下的执行时间对比图。在两种负载下,RCEDC的执行时间和Greedy相近, 且均低于Contrast。在cdn-request-18负载下,比起Contrast,RCEDC的执行时间减少了 81.46%。As shown in Figure 5(g) and Figure 5(h), Figure 5(g) is a comparison chart of execution time under cdn-request-18 load, and Figure 5(h) is a comparison chart of execution time under zipf-0.74 load . Under both loads, the execution time of RCEDC is similar to that of Greedy and lower than that of Contrast. Under the cdn-request-18 load, the execution time of RCEDC is reduced by 81.46% compared to Contrast.

上述实验结果表明:相比于Contrast,RCEDC可以显著降低平均延迟,并大幅度缩短算 法执行时间;相比于Greedy,RCEDC在命中率、卸载流量和平均延迟等指标上都有显著的改 善。即RCEDC可以在有限时间内,得到更接近于最优的缓存放置方案。The above experimental results show that compared with Contrast, RCEDC can significantly reduce the average delay and greatly shorten the algorithm execution time; compared with Greedy, RCEDC has significant improvements in hit rate, offload traffic and average delay. That is, RCEDC can get closer to the optimal cache placement scheme in a limited time.

接下来使用LRU、GDS-LF和RPCRS进行对比实验。其中,LRU基于数据访问时间间隔更 新缓存内容。首先对比三种缓存替换策略下的命中率,平均延迟和卸载流量这三个指标,使 用的配置及负载与前文相同。实验结果如图6所示。Next, we use LRU, GDS-LF and RPCRS to conduct comparative experiments. Among them, the LRU updates the cache content based on the data access time interval. First, compare the hit rate, average delay, and offload traffic under the three cache replacement strategies. The configuration and load used are the same as the previous ones. The experimental results are shown in Figure 6.

如图6(a)和图6(b)所示,图6(a)为cdn-request-18负载下的命中率对比图,图6(b) 为zipf-0.74负载下的命中率对比图。在命中率上的表现,RPCRS优于LRU和GDS-LF。在cdn-request-18负载下,RPCRS相比于GDS-LF和LRU,命中率分别提高了18.85%和25.20%,且三种缓存策略在zipf-0.74负载下均有更高的命中率。As shown in Figure 6(a) and Figure 6(b), Figure 6(a) is a comparison chart of hit ratio under cdn-request-18 load, and Figure 6(b) is a comparison chart of hit ratio under zipf-0.74 load . In terms of hit rate, RPCRS outperforms LRU and GDS-LF. Under the load of cdn-request-18, compared with GDS-LF and LRU, the hit rate of RPCRS is increased by 18.85% and 25.20%, respectively, and the three cache strategies all have higher hit rate under the load of zipf-0.74.

如图6(c)和图6(d)所示,图6(c)为cdn-request-18负载下的卸载流量对比图,图6(d) 为zipf-0.74负载下的卸载流量对比图。在卸载流量上的表现,RPCRS优于LRU和GDS-LF。 在cdn-request-18负载下,RPCRS相比于GDS-LF和LRU,卸载流量分别提高了66.11%和88.11%,且三种缓存策略在zipf-0.74负载下卸载流量进一步提高。As shown in Figure 6(c) and Figure 6(d), Figure 6(c) is a comparison chart of unloaded traffic under cdn-request-18 load, and Figure 6(d) is a comparison chart of unloaded traffic under zipf-0.74 load . On offloading traffic, RPCRS outperforms LRU and GDS-LF. Under the load of cdn-request-18, compared with GDS-LF and LRU, the offload traffic of RPCRS is increased by 66.11% and 88.11% respectively, and the three cache strategies further improve the offload traffic under the load of zipf-0.74.

如图6(e)和图6(f)所示,图6(e)为cdn-request-18负载下的平均延迟对比图,图6(f) 为zipf-0.74负载下的平均延迟对比图。在平均延迟上的表现,RPCRS优于LRU和GDS-LF。 在cdn-request-18负载下,RPCRS相比于GDS-LF和LRU,平均延迟分别降低了12.70%和15.97%,且三种缓存策略在zipf-0.74负载下平均延迟进一步降低。As shown in Figure 6(e) and Figure 6(f), Figure 6(e) is the average delay comparison chart under cdn-request-18 load, and Figure 6(f) is the average delay comparison chart under zipf-0.74 load . In terms of average latency, RPCRS outperforms LRU and GDS-LF. Under the load of cdn-request-18, the average delay of RPCRS is reduced by 12.70% and 15.97%, respectively, compared with GDS-LF and LRU, and the average delay of the three caching strategies is further reduced under the load of zipf-0.74.

分析三种缓存替换策略在命中率和卸载流量上的表现,RPCRS缓存策略将数据按照是否 缓存于多个边缘服务器进行数据分区缓存,通过限制冗余数据最大可用缓存空间,使得协作 缓存域内数据冗余量显著降低,故相比于另外两种缓存替换策略,RPCRS能显著提高缓存命 中率。此时,因为更多的数据访问请求可以在协作缓存域内被处理,RPCRS缓存策略下的卸 载流量也远高于LRU和GDS-LF。再分析三种缓存替换策略在平均延迟上的表现,由于数据访 问过程中存在大量仅访问一次的大对象,这部分数据的缓存会导致大量流行数据被驱逐出缓 存,在缓存空间较小时会严重影响总数据访问延迟。RPCRS和GDS-LF缓存策略考虑了数据大 小对系统性能的影响,不流行的大对象可以快速的被剔除出缓存,对于系统性能影响有限。 而LRU未能考虑数据大小差异,也不考虑数据访问频率差异,无法避免不流行的大对象对缓 存空间的污染。Analyze the performance of the three cache replacement strategies in terms of hit rate and offload traffic. The RPCRS cache strategy partitions and caches data according to whether it is cached in multiple edge servers. By limiting the maximum available cache space for redundant data, the data in the collaborative cache domain is redundant. The margin is significantly reduced, so compared with the other two cache replacement strategies, RPCRS can significantly improve the cache hit rate. At this time, because more data access requests can be processed in the cooperative cache domain, the offload traffic under the RPCRS caching strategy is also much higher than that of LRU and GDS-LF. Then analyze the performance of the three cache replacement strategies on the average delay. Since there are a large number of large objects that are only accessed once during the data access process, the cache of this part of the data will cause a large number of popular data to be expelled from the cache, which will be serious when the cache space is small. Affects total data access latency. The RPCRS and GDS-LF caching strategies take into account the impact of data size on system performance. Unpopular large objects can be quickly removed from the cache with limited impact on system performance. However, LRU fails to take into account the difference in data size and the frequency of data access, so it cannot avoid the pollution of cache space caused by unpopular large objects.

和GDS-LF相比,RPCRS一方面通过限制冗余数据量提高边缘端空间利用率,减少对云服 务器的访问;另一方面将流行数据缓存在各个边缘服务器的冗余区域,使得大部分数据访问 请求可以由本地边缘服务器处理,减少了边缘服务器之间的数据传输频率。所以,比起GDS-LF, RPCRS能进一步降低平均延迟。Compared with GDS-LF, RPCRS improves edge space utilization and reduces access to cloud servers by limiting the amount of redundant data; on the other hand, it caches popular data in redundant areas of each edge server, making most data Access requests can be handled by local edge servers, reducing the frequency of data transmission between edge servers. Therefore, compared to GDS-LF, RPCRS can further reduce the average delay.

以上分别比较了RCEDC、RPCRS和最新研究工作在命中率、平均延迟、卸载流量和运行时 间上的表现。实验结果指出,RCEDC可以使平均延迟最多下降25.40%,执行时间最多缩短 81.46%;RPCRS可以使缓存命中率最多提高29.15%,平均延迟最多下降16.80%。The above compares the performance of RCEDC, RPCRS and the latest research work on hit rate, average latency, offload traffic and runtime respectively. The experimental results show that RCEDC can reduce the average delay by up to 25.40% and the execution time by up to 81.46%; RPCRS can improve the cache hit rate by up to 29.15% and the average delay by up to 16.80%.

实施例2:Example 2:

本实施例用于提供一种基于可感知冗余的云边协同缓存方法,基于实施例1所述的云边 系统缓存系统进行工作,如图7所示,所述云边协同缓存方法包括:The present embodiment is used to provide a cloud-edge collaborative caching method based on perceived redundancy, and works based on the cloud-edge system caching system described in Embodiment 1. As shown in Figure 7, the cloud-edge collaborative caching method includes:

在每一周期内,执行以下操作:During each cycle, do the following:

S1:基于上一周期的数据访问信息,利用基于可感知冗余的协作式边缘数据缓存策略将 数据集合中的数据选择性缓存于多个边缘服务器;所有所述边缘服务器构成协作缓存域;S1: Based on the data access information of the previous cycle, the data in the data set is selectively cached in multiple edge servers using a cooperative edge data caching strategy based on perceived redundancy; all the edge servers form a cooperative cache domain;

S1可以包括:S1 can include:

(1)基于上一周期的数据访问信息计算缓存条件;所述缓存条件包括数据集合中的每一 数据对各个边缘服务器的局部访问偏好,每一数据被缓存在各个边缘服务器内的缓存收益以 及每一边缘服务器的缓存收益阈值;数据访问信息包括每一数据的访问频率、每一数据在每 一边缘服务器服务范围内的累计请求量和数据访问延迟;(1) Calculate the cache condition based on the data access information of the previous cycle; the cache condition includes the local access preference of each data in the data set to each edge server, the cache revenue of each data cached in each edge server, and The cache revenue threshold of each edge server; the data access information includes the access frequency of each data, the cumulative request volume and data access delay of each data within the service scope of each edge server;

具体的,获取数据访问信息由访问信息采集模块实现,根据数据访问信息计算缓存条件 由缓存收益预测模块实现。Specifically, the acquisition of data access information is implemented by the access information collection module, and the calculation of cache conditions according to the data access information is implemented by the cache revenue prediction module.

更为具体的,基于上一周期的数据访问信息计算缓存条件包括:More specifically, the cache conditions calculated based on the data access information of the previous cycle include:

1)对于每一数据,根据数据在每一边缘服务器服务范围内的累计请求量,采用实施例1 中的式(2)计算数据对每一边缘服务器的局部访问偏好,得到数据集合中的每一数据对各个 边缘服务器的局部访问偏好。1) For each data, according to the cumulative request amount of the data within the service range of each edge server, the local access preference of the data to each edge server is calculated by using the formula (2) in Embodiment 1, and each data set in the data set is obtained. The local access preference of a data to each edge server.

2)基于数据的访问频率,采用实施例1中的式(1)计算数据的全局数据流行度;根据 全局数据流行度和数据对每一边缘服务器的局部访问偏好计算数据被缓存在每一边缘服务器 内的缓存收益,得到每一数据被缓存在各个边缘服务器内的缓存收益。2) Based on the access frequency of the data, the global data popularity of the data is calculated using the formula (1) in Embodiment 1; the data is cached at each edge according to the global data popularity and the local access preference of the data to each edge server. Cache revenue in the server, get the cache revenue of each data cached in each edge server.

对于每一边缘服务器,将全局数据流行度和数据对边缘服务器的局部访问偏好进行求和, 利用实施例1中的式(3)得到边缘服务器内该数据的访问频率。根据边缘服务器内该数据的 访问频率和数据的数据大小,利用实施例1中的式(4)计算数据被缓存在边缘服务器内的缓 存收益。For each edge server, the global data popularity and the local access preference of the data to the edge server are summed, and the access frequency of the data in the edge server is obtained by using the formula (3) in Embodiment 1. According to the access frequency of the data in the edge server and the data size of the data, the formula (4) in Embodiment 1 is used to calculate the cache revenue of the data being cached in the edge server.

3)根据每一数据被缓存在各个边缘服务器内的缓存收益确定每一边缘服务器对应的缓存 收益的最小值,根据所有最小值和数据访问延迟,利用实施例1中的式(5)计算每一边缘服 务器的缓存收益阈值。3) Determine the minimum value of the cache revenue corresponding to each edge server according to the cache revenue of each data cached in each edge server, and calculate each Cache revenue threshold for an edge server.

(2)将数据集合中的数据按照缓存收益进行降序排列,基于缓存条件依次将数据选择性 缓存于多个边缘服务器。(2) Arrange the data in the data set in descending order according to the cache revenue, and selectively cache the data in multiple edge servers in sequence based on the cache conditions.

具体的,基于缓存条件依次将数据选择性缓存于多个边缘服务器可以包括:Specifically, sequentially selectively caching data in multiple edge servers based on caching conditions may include:

1)选取第一个数据作为待缓存数据;1) Select the first data as the data to be cached;

2)根据待缓存数据对各个边缘服务器的局部访问偏好对所有边缘服务器进行降序排列, 得到有序的边缘服务器集合;并选取边缘服务器集合中的第一个边缘服务器作为待存储边缘 服务器;2) Arrange all edge servers in descending order according to the local access preference of each edge server according to the data to be cached to obtain an ordered set of edge servers; and select the first edge server in the set of edge servers as the edge server to be stored;

3)判断待存储边缘服务器的剩余缓存空间是否大于待缓存数据的数据大小,得到第一判 断结果;3) Determine whether the remaining cache space of the edge server to be stored is larger than the data size of the data to be cached, and obtain a first judgment result;

4)若第一判断结果为否,则选取边缘服务器集合中的下一个边缘服务器作为待存储边缘 服务器,返回“判断待存储边缘服务器的剩余缓存空间是否大于待缓存数据的数据大小”的 步骤;4) if the first judgment result is no, then select the next edge server in the edge server set as the edge server to be stored, and return the step of "judging whether the remaining cache space of the edge server to be stored is greater than the data size of the data to be cached";

5)若第一判断结果为是,则将待缓存数据缓存于待存储边缘服务器,同时将待缓存数据 缓存于除待存储边缘服务器之外的且符合预设条件的边缘服务器;所述预设条件为待缓存数 据被缓存在边缘服务器内的缓存收益大于边缘服务器的缓存收益阈值以及边缘服务器的剩余 缓存空间大于待缓存数据的数据大小;5) If the first judgment result is yes, cache the data to be cached on the edge server to be stored, and at the same time cache the data to be cached on edge servers other than the edge server to be stored and meet the preset conditions; the preset The condition is that the cache revenue of the data to be cached in the edge server is greater than the cache revenue threshold of the edge server and the remaining cache space of the edge server is greater than the data size of the data to be cached;

6)判断数据集合内的数据是否均已被缓存;6) Determine whether the data in the data set has been cached;

7)若否,则选取位于待缓存数据之后的下一个数据作为待缓存数据,返回“根据待缓存 数据对各个边缘服务器的局部访问偏好对所有边缘服务器进行降序排列”的步骤。7) If not, select the next data after the data to be cached as the data to be cached, and return to the step of "arranging all edge servers in descending order according to the local access preference of each edge server according to the data to be cached".

S2:在接收到数据访问请求后,利用基于可感知冗余的替换策略对各个所述边缘服务器 的缓存空间进行管理。S2: After receiving the data access request, use a replacement strategy based on perceived redundancy to manage the cache space of each of the edge servers.

S2可以包括:S2 can include:

(1)接收到数据访问请求后,判断数据访问请求所请求获取的目标数据是否位于本地边 缘服务器的独占区域,得到第二判断结果;所述本地边缘服务器为覆盖发出数据访问请求的 终端的边缘服务器;每一边缘服务器的缓存空间均被划分为独占区域和冗余区域;所述独占 区域用于缓存其余边缘服务器未缓存的独占数据;所述冗余区域用于缓存独占数据和其余边 缘服务器也缓存的冗余数据;(1) After receiving the data access request, determine whether the target data requested by the data access request is located in the exclusive area of the local edge server, and obtain a second judgment result; the local edge server is the edge covering the terminal that issued the data access request. server; the cache space of each edge server is divided into an exclusive area and a redundant area; the exclusive area is used to cache the exclusive data not cached by the remaining edge servers; the redundant area is used to cache the exclusive data and the remaining edge servers Also cached redundant data;

(2)若第二判断结果为是,则更新目标数据的优先级;(2) if the second judgment result is yes, then update the priority of the target data;

(3)若第二判断结果为否,则判断目标数据是否位于本地边缘服务器的冗余区域,得到 第三判断结果;(3) if the second judgment result is no, then judge whether the target data is located in the redundant area of the local edge server, and obtain the third judgment result;

(4)若第三判断结果为是,则判断目标数据是否为独占数据;若是,则将目标数据升级 缓存至独占区域;若否,则更新目标数据的优先级;(4) if the third judgment result is yes, then judge whether the target data is exclusive data; if so, then the target data is upgraded and cached to the exclusive area; if not, then the priority of the target data is updated;

将目标数据升级缓存至独占区域可以包括:判断独占区域的剩余缓存空间是否大于目标 数据的数据大小;若是,则将目标数据升级缓存至独占区域;若否,则去除独占区域中优先 级最低的独占数据,并返回“判断独占区域的剩余缓存空间是否大于目标数据的数据大小” 的步骤。Upgrading the cache of the target data to the exclusive area may include: judging whether the remaining cache space of the exclusive area is larger than the data size of the target data; if so, upgrading the cache of the target data to the exclusive area; if not, removing the one with the lowest priority in the exclusive area Exclusive data, and return to the step of "judging whether the remaining cache space of the exclusive area is larger than the data size of the target data".

(5)若第三判断结果为否,则判断目标数据是否位于相邻边缘服务器,得到第四判断结 果;相邻边缘服务器为除本地边缘服务器之外的其余边缘服务器;(5) if the 3rd judgment result is no, then judge whether the target data is located in the adjacent edge server, obtain the 4th judgment result; The adjacent edge server is the remaining edge servers except the local edge server;

(6)若第四判断结果为是,则访问相邻边缘服务器,以将目标数据作为冗余数据添加至 本地边缘服务器的冗余区域;(6) if the 4th judgment result is yes, then visit adjacent edge server, to add target data as redundant data to the redundant area of local edge server;

将目标数据作为冗余数据添加至本地边缘服务器的冗余区域可以包括:判断冗余区域的 剩余缓存空间是否大于目标数据的数据大小;若是,则将目标数据缓存至冗余区域;若否, 则去除冗余区域中优先级最低的数据,并返回“判断冗余区域的剩余缓存空间是否大于目标 数据的数据大小”的步骤。Adding the target data as redundant data to the redundant area of the local edge server may include: judging whether the remaining cache space of the redundant area is larger than the data size of the target data; if so, caching the target data in the redundant area; if not, Then remove the data with the lowest priority in the redundant area, and return to the step of "judging whether the remaining cache space of the redundant area is larger than the data size of the target data".

(7)若第四判断结果为否,则访问云数据中心,以将目标数据作为独占数据添加至本地 边缘服务器的独占区域。(7) If the fourth judgment result is no, access the cloud data center to add the target data as exclusive data to the exclusive area of the local edge server.

尽管在此结合各实施例对本发明进行了描述,然而,在实施所要求保护的本发明过程中, 本领域技术人员通过查看附图、公开内容、以及所附权利要求书,可理解并实现公开实施例 的其他变化。在权利要求中,“包括”(comprising)一词不排除其他组成部分或步骤,“一” 或“一个”不排除多个的情况。单个处理器或其他单元可以实现权利要求中列举的若干项功 能。相互不同的从属权利要求中记载了某些措施,但这并不表示这些措施不能组合起来产生 良好的效果。Although the invention is described herein in conjunction with various embodiments, those skilled in the art can understand and implement the disclosure by reviewing the drawings, the disclosure, and the appended claims in practicing the claimed invention. Other variations of the embodiment. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that these measures cannot be combined to advantage.

尽管结合具体特征及其实施例对本发明进行了描述,显而易见的,在不脱离本发明的精 神和范围的情况下,可对其进行各种修改和组合。相应地,本说明书和附图仅仅是所附权利 要求所界定的本发明的示例性说明,且视为已覆盖本发明范围内的任意和所有修改、变化、 组合或等同物。显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明 的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范 围之内,则本发明也意图包括这些改动和变型在内。Although the invention has been described in conjunction with specific features and embodiments thereof, it will be apparent that various modifications and combinations can be made therein without departing from the spirit and scope of the invention. Accordingly, this specification and drawings are merely illustrative of the invention as defined by the appended claims, and are deemed to cover any and all modifications, variations, combinations or equivalents within the scope of the invention. It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit and scope of the invention. Thus, provided that these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A cloud edge cooperative caching method based on perceptual redundancy is characterized by comprising the following steps:
in each cycle, the following operations are performed:
selectively caching data in the data set in a plurality of edge servers by utilizing a cooperative edge data caching strategy based on the perceptible redundancy based on the data access information of the previous period; all the edge servers form a cooperative cache domain;
and after receiving the data access request, managing the cache space of each edge server by using a replacement strategy based on perceptible redundancy.
2. The cloud-edge cooperative caching method according to claim 1, wherein the selectively caching data in the data set in a plurality of edge servers by using a cooperative edge data caching policy based on perceptual redundancy based on the data access information of the previous cycle specifically comprises:
calculating a cache condition based on the data access information of the previous period; the caching condition comprises local access preference of each data in the data set to each edge server, caching income of each data cached in each edge server and caching income threshold of each edge server; the data access information comprises the access frequency of each data, the accumulated request quantity of each data in the service range of each edge server and the data access delay;
and performing descending order arrangement on the data in the data set according to cache benefits, and selectively caching the data in a plurality of edge servers in sequence based on the cache conditions.
3. The cloud-edge cooperative caching method according to claim 2, wherein the calculating the caching condition based on the data access information of the previous cycle specifically includes:
for each piece of data, calculating local access preference of the data to each edge server according to the accumulated request amount of the data in the service range of each edge server, and obtaining the local access preference of each piece of data in the data set to each edge server;
calculating a global data popularity of the data based on the access frequency of the data; calculating cache benefits of the data cached in each edge server according to the global data popularity and the local access preference of the data to each edge server, and obtaining the cache benefits of each data cached in each edge server;
determining the minimum value of the cache income corresponding to each edge server according to the cache income of each data cached in each edge server; and calculating the caching profit threshold of each edge server according to all the minimum values and the data access delay.
4. The cloud-edge collaborative caching method according to claim 3, wherein the calculating, according to the global data popularity and the local access preference of the data to each of the edge servers, a caching gain of the data cached in each of the edge servers specifically comprises:
for each edge server, summing the global data popularity and the local access preference of the data to the edge server to obtain the access frequency of the data in the edge server;
and calculating the caching benefit of the data cached in the edge server according to the access frequency of the data in the edge server and the data size of the data.
5. The cloud-edge collaborative caching method according to claim 2, wherein the sequentially selectively caching the data in the plurality of edge servers based on the caching condition specifically comprises:
selecting first data as data to be cached;
according to the local access preference of the data to be cached on each edge server, performing descending order arrangement on all the edge servers to obtain an ordered edge server set; selecting a first edge server in the edge server set as an edge server to be stored;
judging whether the residual cache space of the edge server to be stored is larger than the data size of the data to be cached or not to obtain a first judgment result;
if the first judgment result is negative, selecting the next edge server in the edge server set as an edge server to be stored, and returning to the step of judging whether the residual cache space of the edge server to be stored is larger than the data size of the data to be cached;
if the first judgment result is yes, caching the data to be cached in the edge server to be stored, and caching the data to be cached in an edge server which is except the edge server to be stored and meets a preset condition; the preset conditions are that the cache benefit of the data to be cached in the edge server is greater than the cache benefit threshold value of the edge server, and the residual cache space of the edge server is greater than the data size of the data to be cached;
judging whether the data in the data set are cached or not;
and if not, selecting the next data behind the data to be cached as the data to be cached, and returning to the step of performing descending order arrangement on all the edge servers according to the local access preference of the data to be cached on all the edge servers.
6. The cloud-edge collaborative caching method according to claim 1, wherein the managing the cache space of each edge server by using a replacement policy based on perceptual redundancy after receiving a data access request specifically comprises:
after receiving a data access request, judging whether target data requested to be acquired by the data access request is located in an exclusive area of a local edge server or not, and obtaining a second judgment result; the local edge server is an edge server covering a terminal which sends the data access request; the cache space of each edge server is divided into an exclusive area and a redundant area; the exclusive area is used for caching the exclusive data which is not cached by the rest edge servers; the redundant area is used for caching the exclusive data and redundant data cached by other edge servers;
if the second judgment result is yes, updating the priority of the target data;
if the second judgment result is negative, judging whether the target data is located in a redundant area of the local edge server to obtain a third judgment result;
if the third judgment result is yes, judging whether the target data is exclusive data or not; if yes, upgrading and caching the target data to the exclusive area; if not, updating the priority of the target data;
if the third judgment result is negative, judging whether the target data is positioned in the adjacent edge server or not to obtain a fourth judgment result; the adjacent edge server is the rest edge servers except the local edge server;
if the fourth judgment result is yes, accessing the adjacent edge server to add the target data serving as redundant data to a redundant area of the local edge server;
if the fourth judgment result is negative, accessing the cloud data center to add the target data serving as exclusive data to an exclusive area of the local edge server.
7. The cloud-edge cooperative caching method according to claim 6, wherein the upgrading and caching the target data to the exclusive area specifically includes:
judging whether the residual cache space of the exclusive area is larger than the data size of the target data or not;
if yes, upgrading and caching the target data to the exclusive area;
if not, removing the exclusive data with the lowest priority in the exclusive area, and returning to the step of judging whether the residual cache space of the exclusive area is larger than the data size of the target data.
8. The cloud edge collaborative caching method according to claim 6, wherein the adding the target data as redundant data to a redundant area of the local edge server specifically comprises:
judging whether the residual cache space of the redundant area is larger than the data size of the target data or not;
if yes, caching the target data to the redundant area;
if not, removing the data with the lowest priority in the redundant area, and returning to the step of judging whether the residual cache space of the redundant area is larger than the data size of the target data.
9. A cloud edge cooperative cache system based on perceptible redundancy is characterized by comprising a cloud data center, an edge server cluster and a controller; the cloud data center and the edge server cluster are both in communication connection with the controller; the edge server cluster is a cooperative cache domain consisting of a plurality of edge servers; the cache space of each edge server is divided into a redundant area and an exclusive area;
the controller is used for executing the cloud edge collaborative caching method of any one of claims 1 to 8.
10. The cloud-edge collaborative caching system according to claim 9, wherein the controller comprises an access information acquisition module, a cache profit prediction module, an edge data caching module, a request processing module, and a cloud service access module;
the access information acquisition module is used for counting relevant information when processing the data access request to obtain data access information;
the cache profit prediction module is used for calculating local access preference, cache profit and a cache profit threshold value based on the data access information;
the edge data caching module is used for performing caching placement operation by utilizing a cooperative edge data caching strategy based on the perceptible redundancy;
the request processing module is used for carrying out cache replacement operation by utilizing a replacement strategy based on perceptual redundancy.
CN202111631200.3A 2021-12-28 2021-12-28 A cloud-edge collaborative caching method and system based on perceived redundancy Pending CN114500529A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111631200.3A CN114500529A (en) 2021-12-28 2021-12-28 A cloud-edge collaborative caching method and system based on perceived redundancy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111631200.3A CN114500529A (en) 2021-12-28 2021-12-28 A cloud-edge collaborative caching method and system based on perceived redundancy

Publications (1)

Publication Number Publication Date
CN114500529A true CN114500529A (en) 2022-05-13

Family

ID=81496702

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111631200.3A Pending CN114500529A (en) 2021-12-28 2021-12-28 A cloud-edge collaborative caching method and system based on perceived redundancy

Country Status (1)

Country Link
CN (1) CN114500529A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114980212A (en) * 2022-04-29 2022-08-30 中移互联网有限公司 Edge caching method, apparatus, electronic device, and readable storage medium
CN116320004A (en) * 2023-05-22 2023-06-23 北京金楼世纪科技有限公司 Content caching method and caching service system
CN117714475A (en) * 2023-12-08 2024-03-15 江苏云工场信息技术有限公司 Intelligent management method and system for edge cloud storage
WO2024188037A1 (en) * 2023-03-10 2024-09-19 华为云计算技术有限公司 Function caching method and system
CN119865497A (en) * 2025-03-25 2025-04-22 浙江万雾信息科技有限公司 Data processing method, system and storage medium based on edge cloud

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923486A (en) * 2010-07-23 2010-12-22 华中科技大学 A method for avoiding data movement in a hardware transactional memory system
CN105049326A (en) * 2015-06-19 2015-11-11 清华大学深圳研究生院 Social content caching method in edge network area
WO2019095402A1 (en) * 2017-11-15 2019-05-23 东南大学 Content popularity prediction-based edge cache system and method therefor
US20190260845A1 (en) * 2017-12-22 2019-08-22 Soochow University Caching method, system, device and readable storage media for edge computing
CN111782612A (en) * 2020-05-14 2020-10-16 北京航空航天大学 Edge caching method for file data in cross-domain virtual data space
CN112887992A (en) * 2021-01-12 2021-06-01 滨州学院 Dense wireless network edge caching method based on access balance core and replacement rate
CN113115362A (en) * 2021-04-16 2021-07-13 三峡大学 Cooperative edge caching method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923486A (en) * 2010-07-23 2010-12-22 华中科技大学 A method for avoiding data movement in a hardware transactional memory system
CN105049326A (en) * 2015-06-19 2015-11-11 清华大学深圳研究生院 Social content caching method in edge network area
WO2019095402A1 (en) * 2017-11-15 2019-05-23 东南大学 Content popularity prediction-based edge cache system and method therefor
US20190260845A1 (en) * 2017-12-22 2019-08-22 Soochow University Caching method, system, device and readable storage media for edge computing
CN111782612A (en) * 2020-05-14 2020-10-16 北京航空航天大学 Edge caching method for file data in cross-domain virtual data space
CN112887992A (en) * 2021-01-12 2021-06-01 滨州学院 Dense wireless network edge caching method based on access balance core and replacement rate
CN113115362A (en) * 2021-04-16 2021-07-13 三峡大学 Cooperative edge caching method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李碧瑶: "边缘网络下的计算卸载和边缘缓存方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 5, 15 May 2021 (2021-05-15), pages 1 - 64 *
王俊岭: "基于协同的边缘缓存策略研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 2, 15 February 2021 (2021-02-15), pages 1 - 70 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114980212A (en) * 2022-04-29 2022-08-30 中移互联网有限公司 Edge caching method, apparatus, electronic device, and readable storage medium
CN114980212B (en) * 2022-04-29 2023-11-21 中移互联网有限公司 Edge caching method, device, electronic device and readable storage medium
WO2024188037A1 (en) * 2023-03-10 2024-09-19 华为云计算技术有限公司 Function caching method and system
CN116320004A (en) * 2023-05-22 2023-06-23 北京金楼世纪科技有限公司 Content caching method and caching service system
CN116320004B (en) * 2023-05-22 2023-08-01 北京金楼世纪科技有限公司 Content caching method and caching service system
CN117714475A (en) * 2023-12-08 2024-03-15 江苏云工场信息技术有限公司 Intelligent management method and system for edge cloud storage
CN117714475B (en) * 2023-12-08 2024-05-14 江苏云工场信息技术有限公司 Intelligent management method and system for edge cloud storage
CN119865497A (en) * 2025-03-25 2025-04-22 浙江万雾信息科技有限公司 Data processing method, system and storage medium based on edge cloud
CN119865497B (en) * 2025-03-25 2025-06-13 浙江万雾信息科技有限公司 Data processing method, system and storage medium based on edge cloud

Similar Documents

Publication Publication Date Title
CN114500529A (en) A cloud-edge collaborative caching method and system based on perceived redundancy
CN103338252B (en) Realizing method of distributed database concurrence storage virtual request mechanism
CN105491156B (en) A kind of the whole network collaborative content cache management system based on SD RAN and method
WO2019119897A1 (en) Edge computing service caching method, system and device, and readable storage medium
EP3089039B1 (en) Cache management method and device
US20110107030A1 (en) Self-organizing methodology for cache cooperation in video distribution networks
CN112218337A (en) A caching strategy decision-making method in mobile edge computing
CN111614754B (en) A Dynamic Adaptive Task Scheduling Method for Cost Efficiency Optimization for Fog Computing
CA3126708A1 (en) Efficient and flexible:load-balancing for clusters of caches under latency constraint
CN112637908B (en) A fine-grained hierarchical edge caching method based on content popularity
CN108366089B (en) A CCN caching method based on content popularity and node importance
CN109982104A (en) The video of mobile awareness prefetches and caching Replacement Decision method in a kind of mobile edge calculations
CN106940696B (en) Information query method and system for SDN multi-layer controller
CN113115362B (en) Cooperative edge caching method and device
CN108900599B (en) Software-defined content-centric network device and cluster cache decision method thereof
CN102868542A (en) Method and system for service quality control in service delivery network
Wang et al. Agile cache replacement in edge computing via offline-online deep reinforcement learning
CN110913430B (en) Active cooperative caching method and cache management device for files in wireless network
CN112631789B (en) Distributed memory system for short video data and video data management method
CN110308965A (en) Rule-based heuristic virtual machine allocation method and system for cloud data center
CN117439655B (en) Space terahertz information center network lightweight caching method and device
CN115484314B (en) An edge cache optimization method for recommendation empowerment under mobile edge computing networks
CN106469193A (en) Multi load metadata I/O service quality performance support method and system
CN112822275B (en) Lightweight caching strategy based on TOPSIS entropy weight method
CN110944050B (en) Reverse proxy server cache dynamic configuration method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20220513

WD01 Invention patent application deemed withdrawn after publication