CN107810501B

CN107810501B - Hierarchical cache filling

Info

Publication number: CN107810501B
Application number: CN201680038924.8A
Authority: CN
Inventors: 安德鲁·陈; 克里斯托弗·布兰德; 丹尼尔·P·埃利斯; 亚历克斯·古塔林
Original assignee: Netflix Inc
Current assignee: Netflix Inc
Priority date: 2015-04-30
Filing date: 2016-04-28
Publication date: 2022-01-11
Anticipated expiration: 2036-04-28
Also published as: CA2984312A1; US11010341B2; KR102031476B1; CN107810501A; SG11201708828UA; US20210271639A1; EP3289490A1; KR20170139671A; AU2021205036A1; US11675740B2; DK3289490T3; US20160321286A1; AU2016255442B2; MX2017013857A; JP6564471B2; EP3289490B1; AU2016255442A1; JP2018524656A; CA2984312C; WO2016176499A1

Abstract

One embodiment of the present invention sets forth a technique for replicating files within a server network. The technique includes determining one or more zone master servers included in a cluster of zone servers and causing each zone master server to retrieve a file from a fill source. The technique further comprises: for at least one local server cluster included in the regional server clusters, one or more local main servers included in the at least one local server cluster are determined, and each local main server is enabled to acquire a file from one regional main server of the one or more regional main servers.

Description

Hierarchical cache filling

Cross Reference to Related Applications

The present disclosure claims the benefit of U.S. provisional patent application serial No. 62/155,430 and attorney docket No. NETF/0094USL filed on 30/4/2015, and U.S. patent application serial No. 15/067,099 and attorney docket No. NETF/0094US filed on 10/3/2016. The subject matter of these related applications is incorporated herein by reference.

Technical Field

Embodiments of the present invention relate generally to data transmission over computer networks and, more particularly, relate to hierarchical cache population.

Background

Many web-based applications provide services such as streaming audio and/or streaming video in a distributed manner over the internet. Generally, such applications operate by distributing multiple copies of each content title (e.g., audio files or video files) across multiple servers located at one or more network locations. By mirroring content between multiple servers, the content may be accessed by a large number of users without the users having to experience significant latency. Moreover, maintaining multiple copies of a particular content title enables a web-based application to quickly and seamlessly recover when a hardware or software failure occurs with respect to a particular server.

To further reduce the latency and overall network requirements associated with providing content titles to users, the servers on which the content titles are stored are typically geographically distributed in one or more areas served by the web application. The web application is then configured to direct each user to a particular server located in the vicinity of the user in order to more efficiently provide the user with the content title.

Managing content titles stored on servers within a large geographic area or distributed across different geographic areas can present several challenges. In particular, copying a given content title to multiple servers located in a large geographic area may consume a significant amount of network resources, e.g., bandwidth. The consumption of network resources increases the cost of web-based applications, particularly when all or part of the network infrastructure is provided by a third party. In addition, conventional techniques for replicating a given content title to multiple servers typically result in the server that first receives the content title experiencing significant fluctuations in bandwidth usage and processing load, for example, when other network servers "flood" the server that first receives the content title with a request for a copy of the content title. Such fluctuations may negatively impact the performance of these "initial" servers, which may reduce the quality of service provided by the initial servers to users accessing the web-based application and/or cause the initial servers to experience software and/or hardware failures.

As previously mentioned, improved techniques for distributing content over a server network would be useful.

Disclosure of Invention

One embodiment of the present invention sets forth a method for replicating a file within a server network. The method includes determining one or more zone master servers included in a cluster of zone servers, and causing each zone master server to retrieve a file from a fill source. The method further comprises the following steps: for at least one local server cluster included in the regional server clusters, one or more local main servers included in the at least one local server cluster are determined, and each local main server is enabled to obtain a file from one regional main server in the one or more regional main servers.

In addition, further embodiments provide a control server and a non-transitory computer readable medium configured to implement the above method.

At least one advantage of the disclosed techniques is that each file can be replicated in a network in a predictable, decentralized, and highly fault-tolerant manner via a particular content server tier without the need for an entity that maintains a primary index. In addition, a predetermined delay may be assigned to each tier in order to prevent lower tiers from overwhelming content servers included at higher tiers, which reduces the severity of server load fluctuations when new or existing files are replicated over the network infrastructure. Further, because the content servers included in each tier are determined based on identifiers associated with the files, each file may be propagated through the network infrastructure via a different sequence of content servers. Thus, the network and processing load is more evenly distributed among the content servers, which improves the overall quality of service provided by the content servers to users accessing the web-based application.

Drawings

So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.

FIG. 1 illustrates a network infrastructure for distributing content to content servers and endpoint devices, in accordance with various embodiments of the present invention;

FIG. 2 is a block diagram of a content server that may be implemented in conjunction with the network infrastructure of FIG. 1, in accordance with various embodiments of the invention;

FIG. 3 is a block diagram of a control server that may be implemented in conjunction with the network infrastructure of FIG. 1, in accordance with various embodiments of the invention;

FIG. 4 illustrates how the content servers of FIG. 1 are geographically distributed, according to various embodiments of the invention;

5A-5C illustrate techniques for replicating files between different tiers of the geographically distributed content servers of FIG. 1, in accordance with various embodiments of the invention; and

fig. 6A and 6B illustrate a flow diagram of method steps for replicating files among geographically distributed content servers, according to various embodiments of the invention.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without one or more of these specific details.

Fig. 1 illustrates a network infrastructure 100 for distributing content to a content server 110 and endpoint devices 115 in accordance with various embodiments of the invention. As shown, the network infrastructure 100 includes a content server 110, a control server 120, and an endpoint device 115, each of which is connected via a communication network 105.

Each endpoint device 115 communicates with one or more content servers 110 (also referred to as "caches" or "nodes") via the network 105 to download content, such as text data, graphics data, audio data, video data, and other types of data. The downloadable content, also referred to herein as a "file," is then presented to the user of the one or more endpoint devices 115. In various embodiments, endpoint device 115 may include a computer system, a set-top box, a mobile computer, a smartphone, a tablet, a console, and a handheld video game system, a Digital Video Recorder (DVR), a DVD player, a connected digital television, a dedicated media streaming device (e.g., a set-top box), and/or any other technically feasible computing platform that has network connectivity and is capable of presenting content (e.g., text, images, video, and/or audio content) to a user.

Each content server 110 may include a web server, database, and server application 217 configured to communicate with the control server 120 to determine the location and availability of various files tracked and managed by the control server 120. Each content server 110 may also communicate with a population source 130 and one or more other content servers 110 to "populate" each content server 110 with copies of various files. Additionally, content server 110 may respond to receiving a request for a file from endpoint device 115. The file may then be distributed from content server 110 or via a broader content distribution network. In some embodiments, content server 110 enables a user to authenticate (e.g., using a username and password) to access files stored on content server 110. Although only a single control server 120 is shown in FIG. 1, in various embodiments, multiple control servers 120 may be implemented to track and manage files.

In various embodiments, the fill source 130 may comprise an online storage service (e.g.,

a simple storage service is provided for the user,

cloud storage, etc.), where a file directory comprising thousands or millions of files is stored and accessed to populate the interiorA content server 110. Although only a single fill source 130 is shown in FIG. 1, in various embodiments, multiple fill sources 130 may be implemented to service requests for files.

Fig. 2 is a block diagram of a content server 110 that may be implemented in conjunction with the network infrastructure 100 of fig. 1, in accordance with various embodiments of the invention. As shown, content server 110 includes, but is not limited to, a Central Processing Unit (CPU)204, a system disk 206, an input/output (I/O) device interface 208, a network interface 210, an interconnect 212, and a system memory 214.

The CPU 204 is configured to retrieve and execute programming instructions stored in the system memory 214, such as a server application 217. Similarly, the CPU 204 is configured to store and retrieve application data from the system memory 214. The interconnect 212 is configured to facilitate transfer of data (e.g., programming instructions and application data) between the CPU 204, the system disk 206, the I/O device interface 208, the network interface 210, and the system memory 214. The I/O device interface 208 is configured to receive input data from the I/O device 216 and send the input data to the CPU 204 via the interconnect 212. For example, the I/O devices 216 may include one or more buttons, a keyboard, a mouse, and/or other input devices. The I/O device interface 208 is also configured to receive output data from the CPU 204 via the interconnect 212 and to transmit the output data to the I/O device 216.

The system disk 206 may include one or more hard disk drives, solid state storage devices, or similar storage devices. The system disk 206 is configured to store non-volatile data, such as files 218 (e.g., audio files, video files, and/or subtitles) associated with a content directory. The file 218 may then be retrieved by one or more endpoint devices 115 via the network 105. In some embodiments, network interface 210 is configured to operate in compliance with the ethernet standard.

System memory 214 includes server application 217, server application 217 configured to service requests for files 218 received from endpoint devices 115 and other content servers 110. When the server application 217 receives a request for a file 218, the server application 217 obtains the corresponding file 218 from the system disk 206 and sends the file 218 to the endpoint device 115 or the content server 110 via the network 105. Server application 217 is also configured to request instructions from control server 120, e.g., location(s) and/or time(s) of a particular file may be requested from a fill source 130, a particular content server 110, etc.

Fig. 3 is a block diagram of a control server 120 that may be implemented in conjunction with the network infrastructure 100 of fig. 1, in accordance with various embodiments of the invention. As shown, control server 120 includes, but is not limited to, a Central Processing Unit (CPU)304, a system disk 306, an input/output (I/O) device interface 308, a network interface 310, an interconnect 312, and a system memory 314.

The CPU 304 is configured to retrieve and execute programming instructions stored in the system memory 314, such as a control application 317. Similarly, the CPU 304 is configured to store application data and retrieve application data from a system memory 314 and database 318 stored in the system disk 306. The interconnect 312 is configured to facilitate data transfer between the CPU 304, the system disk 306, the I/O device interface 308, the network interface 310, and the system memory 314. The I/O device interface 308 is configured to transfer input data and output data between the I/O device 316 and the CPU 304 over the interconnect 312. The system disk 306 may include one or more hard disk drives, solid state storage devices, and the like. The system disk 206 is configured as a database 318 that stores information associated with the content server 110, the fill source(s) 130, and the files 218.

The system memory 314 includes a control application 317, the control application 317 configured to access information stored in the database 318 and process the information to determine the manner in which a particular file 218 is to be replicated between content servers 110 included in the network infrastructure 100. For example, when a copy of new file 218 is to be populated to content server 110, a list of Identifiers (IDs) associated with content server 110 (referred to as server IDs) is accessed from database 318. The control application 317 then processes the server ID in conjunction with the file ID associated with the new file 218 to determine how and when the new file 218 will be replicated between the content servers 210 included in the network infrastructure 100, as described in further detail below in conjunction with FIGS. 4-6.

As previously described herein, managing content (e.g., audio and video files) stored on content servers 110 distributed over a large geographic area presents various challenges. In particular, to serve a wide variety of endpoint devices 115 with different networks and processing capabilities, each content title may be encoded at multiple (e.g., 10 or more) different bit rates. Thus, a single content title, such as a movie or television program, may be associated with multiple files 218 (each file 218 encoding the content title at a different bit rate), where each file 218 may be stored by a content server 110 configured to provide the content title to an endpoint device 115.

Thus, attempting to populate content server 110 with a large catalog of content titles, where each content title is associated with multiple files 218, may consume a significant amount of network resources. Thus, if each content server 110 obtains files 218 directly from a third party source (e.g., the fill source 130) and/or over a third party network, significant access and bandwidth costs may result.

On the other hand, trying to populate content servers 110 with only a small number of local content servers 110 that receive files 218 first to minimize the use of third party sources/networks may place an undue burden on the original content server(s) 110. Such a burden may cause network and processing load fluctuations on the original content server 110 when other content servers 110 flood the content server(s) 110 with requests for files 218.

Thus, in various embodiments, each file 218 may be replicated within the network infrastructure 100 via a particular layer of the content server 110, where the network server 110 may make precise calculations based on the server ID included in the network infrastructure 100 and the file ID associated with the file 218. Because the content server 110 in which the file 218 is to be located can be accurately computed based on the server ID and the file ID, access to the file 218 is deterministic, which reduces or eliminates the occurrence of cache misses. The number of content servers 110 included per tier may be controlled by specifying a replication factor for each file 218 or for each tier, for example, based on the popularity of a particular file 218 and the demand for a particular file 218. Thus, the degree to which the content server 110 is populated locally may be adjusted, which reduces third party network and storage costs. Further, because the content servers 110 included in each tier are determined based on file IDs, each file 218 may be replicated within the network infrastructure 100 via different sequences of content servers 110. Thus, the network and processing load is more evenly distributed among the content servers 110, which improves the overall quality of service provided to users accessing the web-based application via the content servers 110.

In some embodiments, the plurality of files 218 are grouped by category, and a unique file ID is assigned to the file 218 associated with each category. For example, a file 218 associated with a content title a (e.g., a movie or television program) may be assigned a first file ID, while a file 218 associated with a content title B is assigned a second file ID. In addition, in some embodiments, different file IDs are assigned to encode the content titles separately at different bit rates. That is, in such embodiments, files 218 associated with content title a and a first bit rate may be assigned a first file ID, while files 218 associated with content title a and a second bit rate may be assigned a second file ID. Such an embodiment enables a complete set of files 218 associated with a particular category to be stored together on each of one or more content servers 110.

Additionally, in some embodiments, a unique file ID is assigned to each file 218. For example, each of the files 218 associated with the content title a and the first bit rate may be assigned a different file ID. Such embodiments enable distribution of files 218 associated with a particular category across multiple content servers 110, e.g., to enable parallel transmission of files 218 to a particular endpoint device 115. Further, in some embodiments, the granularity at which unique file IDs are assigned to files 218 may vary from category to category.

In various embodiments, the file ID associated with a particular file 218 is encoded in a JavaScript object notation (JSON) object. JavaScript object notation (JSON) is a transformation that represents a data structure called an "object" in a manner generally compatible with JavaScript scripting languages. An object may include one or more elements, where each element is specified with a key. Each element may be a string, a number, another object, an array, a boolean value, or a null. An array is an ordered list of elements delineated by brackets "[" and "]". Elements in the array are separated by commas. An example JSON object associated with file 218 is given below in table 1.

Table 1: example JSON object

In the example JSON object given above, the first and last braces represent the beginning and end of the object, and "video" is the key to the array of videos delimited by the square brackets "[" and "]". The array includes two elements, each element being indicated by a parenthesis. The first such element is an object that includes an "id" field, a "title" field, a "boxart" field, and a "synopsis" field with values as shown. Similarly, the second element in the array has the value as shown. Thus, in this example, a file ID 23432 associated with a first file 218 ("perpetual sunlight for beautiful heart") and a file ID 23521 associated with a second file 218 ("mask") are specified.

Further, in some embodiments, content server 110 may request download instructions (e.g., download location, time delay, etc.) by transferring a JSON object comprising a plurality of file IDs stored in an array to control application 317. An example JSON object containing such an array of file IDs is given in table 2.

Table 2: example JSON array

Fig. 4 illustrates how the content servers 110 of fig. 1 are geographically distributed, according to various embodiments of the invention. As shown, content servers 110 may be organized into various clusters, including a region cluster 410, a local cluster 420, and a manifest cluster 430.

For example, in the exemplary geographic distribution shown in FIG. 4, the content servers 110 are organized into an area cluster 410-1 and an area cluster 410-2. In some embodiments, each area cluster 410 may be associated with a different city, state, country, time zone, continent, and the like. Each zone cluster 410 includes one or more local clusters 420. For example, area cluster 410-1 includes local cluster 420-1 and local cluster 420-2, and area cluster 410-2 includes local cluster 420-3, local cluster 420-4, and local cluster 420-5. Each local cluster 420 includes one or more manifest clusters 430, and each manifest cluster 430 includes one or more content servers 110. For example, each manifest cluster 430 shown in FIG. 4 includes five content servers 110. However, any number of content servers 110 may be included in each manifest cluster 430. Further, any number of area clusters 410, local clusters 420, manifest clusters 430, and content servers 110 may be included in the network infrastructure 100.

Fig. 5A-5C illustrate techniques for replicating files 218 between different tiers of the geographically distributed content server 110 of fig. 1, according to various embodiments of the invention. To meet peak traffic demands and redundancy demands, multiple copies of each file 218 may be replicated on the content servers 110 included in each area cluster 410. Thus, in various embodiments, the control application 317 may perform one or more hash operations (e.g., consistent hash operations described below) to replicate the file 218 evenly among the plurality of content servers 110 without having to pre-compute where the file should be stored and without having to store the index of the content server 110 in a central server. Although the techniques described herein are implemented in connection with a particular hash operation, any technically feasible operation or technique may be used to select the content servers 110 included in a tier through which a particular file 218 is replicated by those content servers 110 in the tier.

In various embodiments, control application 317 may generate a hash value for each content server 110 included by a particular zone cluster 410 by performing a hash operation (e.g., applying MD5 message digest algorithm) on each server ID associated with content server 110. The control application 317 then sorts the resulting hash values to form a hash data structure (e.g., hash ring, hash table, etc.). To determine which content servers 110 are to be used to replicate a particular file 218 on the zone cluster 410, the control application 317 then generates a hash value for the file 218 by applying the same (or different) hash operation to the file ID associated with the file 218.

The control application 317 then compares the hash value associated with the file 218 to the hash data structure and selects the content server 110 having the hash value closest to the hash value of the file 218 as the zone master server 510 for that particular file 218. Further, if a Replication Factor (RF) >1 is specified for the file 218 and/or zone master server 510 tier, the control application 317 selects the next one or more additional content servers 110 in the hash data structure as the zone master server 510 (e.g., based on the hash value). In some embodiments, the control application 317 selects the next content server 110 in the hash data structure by starting at the hash value associated with the first zone master server 510 selected for the zone cluster 410, by "walking" around the hash ring or along the hash table until the desired number of zone master servers 510 are selected.

Once the zone primary server 510 associated with the zone cluster 410 is selected, the file 218 is immediately extracted 515 from the fill source 130 based on instructions received from the control application 317, as shown in FIG. 5A. For clarity of explanation, only one zone master server 510 is shown in each zone cluster 410 shown in FIG. 5A. However, in embodiments where RF >1, multiple regional master servers 510 are selected for a particular file 218, and these regional master servers 510 can immediately retrieve the file 218 from the fill source 130.

Next, the control application 317 filters the hash data structure to include only the content servers 110 located in the particular local cluster 420. The control application 317 then selects one or more next content servers 110 included in the filtered hash data structure as the local master server 520 of the local cluster 420. For example, referring to FIG. 5B, when determining one or more local master servers 520 of the local cluster 420-1, the control application 317 will filter the hash data structure to exclude all content servers 110 not included in the local cluster 420-1. The control application 317 then selects the next content server 110 as the local master server 520 for the local cluster 420-1 by the remaining hash values in the filtered hash data structure. Further, if RF >1 is specified for the file 218 and/or for the local host server 520 tier, the control application 317 selects one or more next content servers 110 included in the filtered hash data structure as the local host server 520 for the local cluster 420-1. The control application 317 then repeats the process for each local cluster 420 included in the zone cluster 410.

As shown in fig. 5B, the local master server 520 extracts the file 218 from the regional master server 510. The control application 317 also provides instructions to the local host server 520 as to how and when the file 218 may be extracted. In some embodiments, to prevent the regional master server(s) 510 from being overwhelmed by requests for files 218, the control application 317 instructs the local master server 520 associated with each local cluster 420 to wait a first predetermined period of time (e.g., 1 to 3 hours after the regional master server(s) 510 extracts 515 files from the fill source 130) before extracting files 218 from the regional master server(s) 510. Additionally, the control application 317 may instruct the local host server 520 to extract the file 218 from a more expensive source (e.g., the fill source 130) only after a second predetermined period of time longer than the first predetermined period of time (e.g., 2 to 4 hours after the regional host server(s) 510 extracts 515 the file 218 from the fill source 130). Implementing the second predetermined time period (after which the local master server 520 may extract files from more expensive sources) balances reducing network costs and reducing long delays that occur due to network problems, software/hardware failures, etc. associated with the respective regional master server 510. Further, in some embodiments, the local master server 520 may obtain the file 218 from the zone master servers 510 associated with different zone clusters 410.

Alternatively, the control application 317 may again filter the hash data structure to include only the content servers 110 located in a particular manifest cluster 430. The control application 317 then selects one or more next content servers 110 included in the filtered hash data structure as the manifest master server 530 for the manifest cluster 430. For example, referring to FIG. 5C, when determining one or more manifest master servers 530 for the manifest cluster 430-1, the control application 317 may filter the hash data structure to exclude all content servers 110 not included in the manifest cluster 430-1. The control application 317 then selects the next content server 110 as the manifest master server 530-3 for the manifest cluster 430-1 by the remaining hash values in the filtered hash data structure. Further, if RF >1 is specified for the file 218 and/or manifest master 530 layers, the control application 317 may select one or more next content servers 110 included in the filtered hash data structure as the manifest master 530 for the manifest cluster 430-1. The control application 317 then repeats the process for each manifest cluster 430 included in the area cluster 410. In addition, the control application 317 repeats the entire process of determining the area master server 510, the local master server 520, and the inventory master server 530 for each area cluster 410 included in the infrastructure network 100.

As shown in FIG. 5C, manifest host server 530 extracts 535 file 218 from local host server 520. In some embodiments, to prevent the local master server(s) 520 from being overwhelmed by requests for files 218, the control application 317 instructs the manifest master server 530 associated with each manifest cluster 430 to wait a first predetermined period of time (e.g., 1 to 3 hours after the local master server(s) 520 obtain 525 the files 218 from the regional master server 510) before retrieving the files 218 from the local master server(s) 520. Further, the control application 317 may instruct the manifest master server 530 to fetch the file 218 from a more expensive source (e.g., the regional master server(s) 510) after a second predetermined period of time (e.g., 2 to 4 hours after the local master server(s) 520 fetch 525 the file 218 from the regional master server (s)). Further, the control application 317 may instruct the manifest master server 530 to retrieve the file 218 from the fill source 130 after a third predetermined time period (e.g., 4 to 6 hours after the local master server(s) 520 retrieves 525 the file 218 from the regional master server(s) 510).

Once the manifest master server 530 within a particular manifest cluster 430 receives a copy of the file 218, each non-master server 540 included in the manifest cluster 430 may extract the file 218 from the manifest master server 530. Advantageously, the abstraction between the non-master server 540 and the manifest master server 530 within the manifest cluster 430 does not typically incur any third party network or storage charges. In some embodiments, the control application 317 instructs the non-primary server 540 to wait an optional first predetermined period of time (e.g., 1 to 3 hours after the manifest primary server(s) 530 extract 535 the file 218 from the local primary server(s) 530) before extracting the file 218 from the manifest primary server 530. Further, the control application 317 may instruct the non-home server 540 to wait before fetching the file 218 from the more expensive source (e.g., local home server(s) 520, regional home server(s) 510, and the fill source 130).

To avoid the "hot neighbor" problem (where the content server 110, in turn located in the hash data structure, receives multiple popular files 218 and therefore suffers a performance degradation), the control application 317 optionally combines each server ID with multiple values before generating the hash data structure. For example, the control application 317 may combine each server ID with a fixed range of 1000 constants. The control application 317 then performs a hash operation on each combination to generate 1000 hash values for each content server 110. Thus, each content server 110 will appear multiple times in the hash data structure, which greatly reduces the likelihood that the same subset of content servers 110 will receive multiple popular files 218. An example is provided below.

In a specific example, assume that a particular area cluster 410 includes 20 content servers. To create a hash data structure (e.g., a hash ring), the control application 317 combines each of the 20 server IDs with a fixed range of constants (e.g., 1 to 1000), applies a hash operation to each combination, and sorts the generated 20000 hash values to produce a hash ring. Assuming further that the content server 110 is named from a to T, the resulting hash ring may include the following hash values:

00000000:D

00003064:A

00005662:S

00007174:N

00009947:A

00012516:T

00015577:Q

00016652:R

00021625:L

00025057:Q

00028665:K

…

23451234:F

23453753:R

23456802:Q

23459919:I

23462687:A

23463273:I

23466229:T

23471069:M

23475876:G

23476011:T

(Hash Ring loops back to the beginning)

Given RF 3, the control application 317 then performs a hash operation on the file ID associated with the file 218 and selects the first 3 content servers 110 encountered on the hash ring. For example, assuming that performing a hash operation on the file ID results in a hash value 00015500, the control application 317 will select Q as the primary content server 110, and will select R and L as the replica content server 110 because 00015577 is the first content server 110 to encounter after 00015500.

Fig. 6A and 6B illustrate a flow diagram of method steps for replicating files 218 on geographically distributed content servers 110, according to various embodiments of the invention. Although the method steps are described in conjunction with the systems of fig. 1-5C, persons skilled in the art will understand that any system configured to perform the method steps in any order is within the scope of the present invention.

As shown in fig. 6A, the method 600 begins at step 605, where the control application 317 generates a hash data structure, such as a hash ring or hash table, by performing a first hash operation on a server Identifier (ID) associated with each content server 110 included in a particular zone cluster 410 of content servers 110. At step 610, the control application 317 generates a hash value by performing a second hash operation on the file ID associated with the file 218. In some embodiments, the first hash operation and the second hash operation are the same (e.g., MD5 hash operations). However, in various embodiments, the first hash operation and the second hash operation may comprise different operations. Further, in some embodiments, the first hash operation and/or the second hash operation may include one or more transformations that modify the server ID and/or the file ID.

At step 615, the control application 317 determines one or more zone master servers 510 included in the zone cluster 410 by comparing the first hash value to the hash data structure. In various embodiments, one or more zone master servers 510 are comprised of a subset of the content servers 110 included in the zone cluster 410. Then, at step 620, the control application 317 causes each zone host server 510 to extract the file 218 from the fill source 130. For example, the control application 317 may cause the zone host server 510 to extract the file 218 by issuing an instruction to each zone host server 510 in response to receiving a JSON array from the zone host server 510 and by specifying a file ID.

Next, at step 625, the control application 317 selects a local cluster 420 of content servers 110 included in the regional cluster 410. At step 630, the control application 317 filters the hash data structure to determine one or more local host servers 520 included in the local cluster 420 based on the hash value generated for the file ID. In various embodiments, one or more local master servers 520 are comprised of a subset of the content servers 110 included in the local cluster 420. At step 635, the control application 317 then causes each local master server 520 to wait a predetermined period of time before extracting the file 218 from the regional master server 510. In some embodiments, the predetermined time that the local master server 520 needs to wait before retrieving the file 218 from the regional master server 510 is the same for each local master server 520, while in other embodiments the predetermined time may vary from local master server 520.

At step 640, the control application 317 determines whether one or more local master servers 520 need to be determined for the additional local cluster 420. If the control application 317 determines that one or more local master servers 520 need to be determined for additional local clusters 420, the method 600 returns to step 625 described above. If the control application 317 determines that the local master server(s) 520 need not be determined for additional local clusters 420, the method 600 proceeds to step 645 shown in FIG. 6B, where the control application 317 selects the manifest cluster 430 for the content server 110 at step 645.

In step 650, the control application 317 filters the hash data structure to determine one or more manifest master servers 530 included in the manifest cluster 430 based on the hash value. In various embodiments, one or more manifest master servers 530 are comprised of a subset of the content servers 110 included in the manifest cluster 430.

Because regional master server(s) 510, local master server(s) 520, and manifest master server(s) 530 are determined based on the server ID and file ID, the location in the infrastructure network 100 where the file 218 is replicated and the time at which the file 218 can be extracted from each location can be determined by any content server 110 having a record of the server ID and file ID. Thus, files 218 may be replicated in a decentralized, highly fault-tolerant manner through the infrastructure network 100 without requiring an entity, such as the control server 217, to maintain the primary index.

At step 655, the control application 317 causes each manifest host server 530 to wait a predetermined period of time before retrieving the file 218 from the local host server 520. In some embodiments, the predetermined time that manifest master server 530 needs to wait before retrieving a file 218 from local master server 520 is the same for each manifest master server 530, while in other embodiments the predetermined time may vary from one manifest master server 530 to another.

At step 660, the control application 317 determines whether one or more manifest master servers 530 need to be determined for additional manifest clusters 430. If the control application 317 determines that one or more inventory master servers 530 need to be determined for additional inventory clusters 430, the method 600 returns to step 645 described above. If the control application 317 determines that the manifest master server 530 does not need to be determined for additional manifest clusters 430, the method 600 proceeds to step 665 where the control application 317 causes each non-master server 540 to wait a predetermined period of time before retrieving the file 218 from the manifest master server 530 at step 665. In some embodiments, the predetermined time that the non-primary server 540 needs to wait before retrieving the file 218 from the manifest primary server 530 is the same for each non-primary server 540, while in other embodiments the predetermined time may vary from non-primary server 540 to non-primary server 540.

At step 670, the control application 317 determines whether the regional master server(s) 510, the local master server(s) 520, and/or the manifest master server(s) 530 need to be determined for additional files 218. If the control application 317 determines that the regional master server(s) 510, the local master server(s) 520, and/or the manifest master server(s) 530 need to be determined for additional files 218, the method 600 returns to step 610. If the control application 317 determines that the regional master server(s) 510, the local master server(s) 520, and/or the manifest master server(s) 530 need not be determined for additional files 218, the method 600 terminates.

In summary, the control application performs a hash operation on the plurality of server IDs to generate a hash data structure. The control application then performs a hash operation on the file ID and compares the hash value to the hash data structure to select one or more zone master servers. The hash data structure is then filtered for each local cluster and manifest cluster to determine the local master server and manifest master server, respectively, through which the file is to be replicated. The control application also assigns a predetermined delay to each of the local master server tier, the manifest master server tier, and the non-master server tier to control the rate at which files are replicated over the infrastructure network.

At least one advantage of the disclosed techniques is that each file can be replicated in a network in a predictable, decentralized, and highly fault-tolerant manner via a particular tier of content servers without the need for an entity that maintains a primary index. In addition, the number of content servers included in each tier through which a file is copied may be controlled, for example, based on the popularity and desirability of the file by specifying a copy factor on a per tier basis or on a per file basis. Furthermore, a predetermined delay may be assigned to each tier in order to prevent lower tiers from overwhelming content servers included at higher tiers, which reduces the severity of server load fluctuations. Further, because the content servers included in each tier are determined based on identifiers associated with the files, each file may be propagated through the network infrastructure via a different sequence of content servers. Thus, the network and processing load is more evenly distributed among the content servers, which improves the overall quality of service provided by the content servers to users who are accessing the web-based application.

The description of the various embodiments has been presented for purposes of illustration but is not intended to be exhaustive or limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.

Aspects of the present embodiments may be embodied as a system, method or computer program product. Accordingly, aspects of the present disclosure may take the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit," module "or" system. Furthermore, aspects of the present disclosure may take the form of a computer program product (having computer-readable program code embodied thereon) embodied in one or more computer-readable media.

Any combination of one or more computer-readable media may be used. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, or a field programmable array.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The invention has been described above with reference to specific embodiments. However, those skilled in the art will understand that various modifications and changes may be made to the embodiments without departing from the broader spirit and scope of the invention as set forth in the appended claims. For example, and without limitation, although many of the descriptions herein refer to specific types of files, hash operations, and server tiers, those skilled in the art will appreciate that the systems and techniques described herein are applicable to other types of data files, algorithms, and hierarchies. The foregoing description and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

While the foregoing is directed to embodiments of the present disclosure, other and further embodiments of the disclosure may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims

1. A hierarchical cache filling method, comprising:

determining one or more regional main servers included in a regional server cluster;

causing each regional master server to obtain files from a fill source; and

for at least one local server cluster included in the regional server clusters:

determining one or more local master servers included in the at least one local server cluster; and

causing each local master server to retrieve the file from one of the one or more regional master servers,

wherein the method further comprises:

performing a first hash operation based on a plurality of server identifier IDs to generate a hash data structure, wherein each server identifier ID included in the plurality of server identifier IDs is associated with a different server included in the cluster of zone servers; and

performing a second hash operation based on the file ID associated with the file obtained from the fill source to generate a first hash value,

wherein determining the one or more zone master servers comprises comparing the first hash value to the hash data structure.

2. The method of claim 1, further comprising: for each inventory server cluster included in the at least one local server cluster:

determining one or more manifest main servers included in the cluster of manifest servers; and

causing each manifest master server to retrieve the file from a local master server associated with the local server cluster.

3. The method of claim 2, further comprising: causing at least one local master server to wait a first predetermined period of time before retrieving the file from one of the one or more regional master servers, and causing at least one manifest master server to wait a second predetermined period of time before retrieving the file from the at least one local master server.

4. The method of claim 1, wherein determining the one or more local master servers comprises performing a filtering operation on the hash data structure based on the first hash value.

5. The method of claim 4, further comprising: for each inventory server cluster included in the at least one local server cluster:

filtering the hash data structure based on the first hash value to determine one or more manifest master servers included in the cluster of manifest servers; and

6. The method of claim 1, wherein the hash data structure comprises a hash ring and the first and second hash operations comprise consistent hash operations.

7. The method of claim 1, wherein performing the first hash operation comprises, for each server identifier ID:

combining the server identifier ID with a plurality of different values to generate a plurality of different server identifier ID instances;

performing the first hash operation on each of the plurality of server identifier ID instances to generate a plurality of hash data structure values; and

storing the plurality of hash data structure values in a hash data structure.

8. The method of claim 1, wherein the number of regional master servers and the number of local master servers is based on at least one replication factor associated with the file.

9. A control server, comprising:

a memory storing a control application; and

a processor coupled to the memory, wherein the control application, when executed by the processor, configures the processor to:

determining a first plurality of regional master servers included in a regional server cluster;

enabling each regional master server to acquire files from the filling source; and is

For at least one local server cluster included in the regional server clusters:

determining one or more local master servers included in the at least one local server cluster; and is

Causing each local master server to retrieve the file from one of the first plurality of regional master servers,

wherein the control application further configures the processor to:

wherein the processor is configured to determine the first plurality of zone master servers by comparing the first hash value to the hash data structure.

10. The control server of claim 9, wherein the control application further configures the processor to: for each inventory server cluster included in the at least one local server cluster:

11. The control server of claim 10, wherein the control application further configures the processor to: causing at least one local master server to wait a first predetermined period of time before retrieving the file from one of the first plurality of regional master servers, and causing at least one manifest master server to wait a second predetermined period of time before retrieving the file from the at least one local master server.

12. The control server of claim 9, wherein the control application further configures the processor to:

performing a third hash operation based on a second file ID associated with a second file to generate a second hash value;

comparing the second hash value to the hash data structure to determine a second plurality of zone master servers included in the cluster of zone servers, wherein the second plurality of zone master servers includes at least one server not included in the first plurality of zone master servers;

enabling each regional master server to obtain the second file from the filling source; and

for at least one local server cluster included in the regional server clusters:

performing a filtering operation on the hash data structure based on the second hash value to determine a plurality of local master servers included in the local server cluster, wherein the plurality of local master servers includes at least one server not included in the one or more local master servers; and

causing each local master server to retrieve the second file from one of the second plurality of regional master servers.

13. The control server of claim 9, wherein the control application further configures the processor to: in response to receiving an array of file IDs from servers included in the cluster of zone servers, performing the first hash operation and comparing the first hash value to the hash data structure.

14. The control server of claim 13, wherein the array of file IDs is included in a JavaScript object notation JSON object received from the server.

15. The control server of claim 9, wherein the control application further configures the processor to determine the one or more local master servers by, for each local server cluster included in the at least one local server cluster:

excluding from the hash data structure all servers that are not included in the local server cluster; and

determining one or more servers remaining in the hash data structure having hash values closest to the first hash value to select the one or more local master servers.

16. The control server of claim 9, wherein the first number of servers included in the first plurality of regional master servers is based on a first replication factor associated with a regional master server tier and the second number of servers included in the one or more local master servers is based on a second replication factor associated with a local master server tier.

17. A non-transitory computer readable storage medium comprising instructions that, when executed by a processor, cause the processor to perform the steps of:

performing a first hash operation based on a plurality of server identifier IDs to generate a hash ring, wherein each server identifier ID included in the plurality of server identifier IDs is associated with a different server included in a regional server cluster;

performing a second hash operation based on a file ID associated with the file obtained from the fill source to generate a first hash value;

determining one or more hash values on a hash ring that are closest to the first hash value to select one or more zone master servers included in the cluster of zone servers;

enabling each regional master server to obtain the file from a filling source; and

for each local server cluster of a plurality of local server clusters included in the regional server cluster:

performing a filtering operation on the hash ring based on the first hash value to determine one or more local master servers included in the local server cluster; and

causing each local master server to retrieve the file from one of the one or more regional master servers.

18. The non-transitory computer-readable storage medium of claim 17, wherein performing the filtering operation on the hash ring comprises: for each local server cluster:

excluding from the hash ring all servers not included in the local server cluster; and

determining one or more servers remaining in the hash ring having hash values closest to the first hash value to select the one or more local master servers.