WO2017039538A1

WO2017039538A1 - Systems and methods for unified storage services

Info

Publication number: WO2017039538A1
Application number: PCT/SG2016/050417
Authority: WO
Inventors: Xiaohui Liu
Original assignee: Xiaohui Liu
Priority date: 2015-08-28
Filing date: 2016-08-26
Publication date: 2017-03-09
Also published as: SG10201607160PA

Abstract

Systems and methods are disclosed herein for unified storage services through performing data operations, including encryption, encoding, decoding and distribution within a storage environment. The systems support a variety of users and storage devices that connect to the system in a network environment, which permits data transfer over networks (302), such as the Internet, LAN, and SAN. The systems allow users to subscribe to a plurality of storage services (310, 320, 330) and provide a single simple interface (110) to manage the data stored. Methods are disclosed for dividing data into chunks to be distributed to a plurality of storage services (310, 320, 330). Methods are also disclosed for adding redundancy to enhance the reliability of the data. Methods are also disclosed for performing encryption to enhance the security of the data. Methods are disclosed for identifying suitable encoding and encryption configurations, and storage locations for data files subject to a storage policy.

Description

Systems and Methods for Unified Storage Services

Field of invention

This invention generally relates to systems and methods for providing Unified Storage Services (USSs) utilizing multiple storage services.

Background of the invention

Nowadays, the storage architecture for companies or consumers become more and more complicated. For example, more and more data are to be transferred to cloud storage. There are numerous cloud storage providers such as Dropbox, Amazon, Google Drive and etc. A single cloud storage provider may provide cloud storage across different geographical locations. The cloud storage allows the access of data from anywhere at any time as the data is stored in the data centre from the respective cloud storage providers. However, storing data in a single cloud storage provider has a variety of problems in various aspects such as cost, reliability and security. To be more specific, firstly, it can be prohibitively expensive to switch from one provider to another. Secondly, storing data in a single cloud storage provider has higher chance of potential data loss or unavailable if the particular provider suffers outages or goes out of business. Thirdly, storing data in a single storage provider is prone to data leakage from the single storage provider and the particular storage provider theoretically has the access to the data. Nowadays, more users have more than one cloud storage account. Sometimes, the data may be even stored in a combination of cloud storages and local storages. Managing multiple cloud storage service accounts, multiple types of storage mediums and the data stored in different accounts and mediums are tedious and challenging.

Therefore, there remains a need for effective systems and methods to allow users to enjoy advantages of a plurality storage services, and at the same time to utilize a plurality of storages easily to avoid above mentioned issues.

Summary of the invention

The above and other problems are solved and an advance in the art is made by systems and methods for Unified Storage Services (USS) in accordance with this invention. A first advantage of systems and methods to provide USS services is simplicity. Only simple operations related to each storage services including registration/deregistration or update are required to be conducted. The technical challenges about how the data is stored across multiple Storage Services (SSs) are transparent to the users or application of the users. A single unified space is provided to each user by the USS services. The unified space actually has an aggregated capacity obtained from a plurality of SSs. As the systems and methods provide cloud storage services, the USS services can be accessed from any devices including computers and mobile devices. A second advantage of the systems and methods to provide USS services is that the user does not suffer from data loss or unavailability due to the outage of a subset of his subscribed SSs. A third advantage of the systems and methods for USS services is that not a single provider has a complete file stored. As a result, even if a subset of providers has data leakage, the whole data is to be stolen. A fourth advantage of the systems and methods to provide USS services is that the latency for retrieving the data file stored is shortened. In response to these and other needs, the embodiments of the invention provide USS services. The system utilizes more than one SS. It is to be appreciated by one skilled in the art that cloud storage service from different cloud service providers can be considered as different SSs, cloud storage from the same service provider but with different geographical locations can also be considered as different SSs, various types of private cloud and/or local storage services can also be considered as different SSs.

Brief Description of the Drawings

FIG. 1 illustrates an example of one arrangement of resources in a computing network that may employ the processes and techniques according to a first embodiment of the invention; FIG. 2 illustrates an example of one arrangement of resources in a computing network that may employ the processes and techniques according to a second embodiment of the invention;

FIG. 3 shows an example of account info associated with User 1;

FIG.4 shows a Ul example of the USS Interface in FIG. 1;

FIG. 5 shows a Ul example of the storage policy;

FIG. 6 shows a Ul example of the interface to "Set a Storage Policy";

FIG. 7 (a) shows the exemplary process of the data processing in the Data Processing Means from USS Client;

FIG. 7 (b) shows another exemplary process of the data processing in the Data Processing Means from USS Client;

FIG. 8 illustrates a flow diagram of process 700 performed by the Distribution Engine 210 in accordance with an embodiment of this invention; FIG. 9 shows exemplary communication sequences among USS Client, USS Core and different SSs to upload/create data;

FIG. 10 shows exemplary communication sequences among USS Client, USS Core and different SSs when retrieving data is requested;

FIG. 11 shows exemplary communication sequences among USS Client, USS Core and different SSs when deletion of data is requested;

FIG. 12 shows exemplary communication sequences among USS Client, USS Core and different SSs when the USS Core trigger the USS Client to check the status of the subscribed SSs.

FIG. 13 shows another Ul example of the storage policy;

FIG. 14 shows another Ul example of the interface to "Set a Storage Policy";

FIG. 15 shows an exemplary case of adding redundancy through using (6, 3) erasure code;

FIG. 16 shows an exemplary case of a repairing process using the same erasure code as that in FIG. 15;

FIG. 17 shows another exemplary process 800 of the data processing in the Data Processing Means from USS Client;

FIG. 18 illustrates a flow diagram of process 900 performed by the Distribution Engine 210 in accordance with another embodiment of this invention;

FIG. 19 shows another exemplary communication sequences among USS Client, USS Core and different SSs to upload/create data;

FIG. 20 shows another exemplary communication sequences among USS Client, USS Core and different SSs when the USS Core trigger the USS Client to check the status of the subscribed SSs.

Detailed Description of Embodiments of the Invention

Various embodiments of the invention will now be described. The following description provides specific details for a thorough understanding and enabling description of these embodiments. It is to be appreciated by one skilled in the relevant art that the invention may be practiced without some of the details described in the embodiments. Likewise, the features, structures or functions which are well known by one skilled in the relevant art may not be described in detail below, so as to avoid unnecessarily obscuring the relevant description. The terminology used below is to be interpreted in its broadest reasonable manner, even though it is being used in conjunction with a detailed description of certain specific examples of the invention. Indeed, certain terms may even be emphasized below; however, any terminology intended to be interpreted in any restricted manner will be overtly and specifically defined as such in this Detailed Description section. Unless described otherwise below, aspects of the invention may be practiced with conventional data processing and data storage systems. Thus, the construction and operation of the various blocks shown in the Figures may be of conventional design, and need not be described in further detail herein to make and use aspects of the invention, because such blocks will be understood by those skilled in the relevant art. One skilled in the relevant art can readily make any modifications necessary to the blocks in the Figures based on the detailed description provided herein.

Suitable Environments

The Figures and the discussion herein provide a brief, general description of certain suitable computing environments in which aspects of the invention can be implemented. Although not required, aspects of the invention are described in the general context of computer- executable instructions, such as routines executed by a general-purpose computer, e.g., a server computer, wireless device, or personal computer. Those skilled in the relevant art will appreciate that aspects of the invention can be practiced with other communications, data processing, or computer system configurations, including: Internet appliances, hand-held devices (including personal digital assistants (PDAs), wearable computers, all manner of cellular or mobile phones, multiprocessor systems, microprocessor-based or programmable consumer electronics, set-top boxes, network PCs, minicomputers, mainframe computers, and the like. The terms "computer," "server," and the like are generally used interchangeably herein, and refer to any of the above devices and systems, as well as any data processor. Aspects of the invention can be practiced in software that controls or operates data storage hardware that is specifically designed for use in data storage networks, e.g., as described in detail herein. It is to be appreciated by one skilled in the art that file and data are exchangeable to describe the data to be stored in a cloud environment.

While aspects of the invention, such as certain functions, are described as being performed exclusively on a single device, the invention can also be practiced in distributed environments where functions or modules are shared among disparate processing devices, which are linked through a communications network, such as a Local Area Network (LAN), Wide Area Network (WAN), and/or the Internet. In a distributed computing environment, program modules may be located in both local and remote memory storage devices. Aspects of the invention including computer implemented instructions, data structures, screen displays, and other data may be stored or distributed on tangible computer-readable storage media, including magnetically or optically readable computer discs, hard-wired or pre-programmed chips (e.g., EEP OM semiconductor chips), nanotechnology memory, biological memory, or other data storage media. Alternatively, computer implemented instructions, data structures, screen displays, and other data under aspects of the invention may be distributed via communication medium, such as over the Internet or over other networks (including wireless net- works), on a propagated signal on a propagation medium (e.g., an electromagnetic wave(s), a sound wave, etc.) over a period of time, or they may be provided on any analogue or digital network (packet switched, circuit switched, or other scheme).

FIG. 1 illustrates one example of the system for USS services according to a first embodiment of the invention. The system comprises USS Client 100 and USS Core 200. It should be noted that the USS Client 100 and USS Core 200 can be jointly or separately implemented with a computer or a cluster of computers, or any handheld devices, or any form of hand phones, or any form of processors, or any form of services hosted in any cloud platforms. As shown in FIG. 1(a), the USS Client 100, includes USS Interface 110, SS Controller 120, Data Processing Means 130, USS Client Memory 140 and Distributor 150; and the USS Core 200 comprises Distribution Engine 210, SS Monitor 220 and USS Core Memory 230. It is to be appreciated by one skilled in the art that if USS Client 100 and USS Core 200 are implemented with one computer, USS Core Memory 230 and USS Client Memory 140 may be the same.

The USS Interface 110 is the interface to upload, retrieve, update or delete files, directories and other data objects to a plurality of SSs. The USS interface 110 may be any form of Application Programming Interface (API) or any form of User Interface (Ul), or a combination of both. FIG. 4 shows a Ul example of USS Interface 110. The USS Interface 110 may have a USS space showing all the files stored by the user. Alternatively, the USS Interface 110 may contain API to allow retrieving the list of files stored. The USS interface 110 may comprise a Ul "File Operation" or APIs for users or consumers to conduct any file operations such as uploading, retrieving, modifying and deleting. The USS Interface 110 may comprise a Ul such as "Manage SSs" or APIs to allow users or consumers to manage their accounts of SSs when necessary. The USS Interface 110 may also contain a Ul such as "Set a Storage Policy" defined in FIG. 5 or APIs to set a Storage Policy 420. It is to be appreciated by one skilled in the art that the Ul "Manage SSs" and "Set a Storage Policy" may be in other similar forms or appearances and they may contain more sub-interfaces or pop-up windows. In some implementations, the Storage Policy 420 in FIG. 5 may comprise a Cost Policy 428. The Cost Policy 428 is a set of preferences, priorities, rules and/or criteria that specify the requirements of cost. In some implementations, the Storage Policy 420 may comprise an Access Latency Policy 422. The Access Latency Policy 422 is a set of preferences, priorities, rules and/or criteria that specify the requirements of access latency. When the access latency policy is combined with the cost policy, the user can specify whether the storage policy favours shortening the access latency or lower the cost. It is to be appreciated by one skilled in the art that Policies 422 and 428 are not necessary to be all present. FIG. 6 shows an example of the interface to set a storage policy. The interface to set a storage policy can be triggered by clicking "Set a Storage Policy" in the USS interface shown in FIG. 4. The user can enable minimizing access latency by checking the corresponding option or enable minimizing cost by checking the "Minimize Cost" option. It is to be appreciated by one skilled in the art that the interfaces "File Operation", "Manage SSs" and "Set a Storage Policy" may not be all present. It is also to be appreciated by one skilled in the art that the Ul interfaces can also be provided in any form of APIs or a combination of API and Ul.

The SS Controller 120 is configured to allow a user to register, update and de-register for different SSs, gather the account info for each user, and store the info in the USS Client Memory 140. For each subscription between a user and SS, the account info includes at least the total space and available space. The account info may also contain other constraints such as the maximum number of files, the maximum size of the stored file/data and etc. The account info associated with each subscription is also sent to USS Core 200 through communication channel 300 to store in USS Core memory 230. FIG. 3 shows an example of account info associated with User 1, where User 1 has two subscriptions with two SSs. For the ease of use, the user name and/or password may also be stored in the account info if allowance is obtained from the user. Depending on the level of security concern, the account user name and password may not be sent to the USS Core 200. The different SSs can come from different cloud storage service providers, different geographic locations of the same cloud storage service provider or different types of local storage services. Data Processing Means 130 is configured to pre-process the data to be stored. Data Processing Means 130 may comprise a Splitter 132 to divide the data into a plurality of smaller chunks. Data Processing Means 130 may also comprise a Signature Generator 138 to generate the signatures for the individual chunks, the whole data and potentially the Metadata. Distributor 150 in USS Client 100 distributes the chunks generated from the Data Processing Means 130 to different SSs including SSI 310, SS2 320 and SS3 330 based on a distribution pattern through a communication channel 302.

The USS Client Memory 230 stores the account information associated with the user. The USS Client Memory 230 may also store the Metadata of each file. The Metadata of each file comprises all the signatures related to the file, individual chunks and the location of the file. The Metadata of each file may also comprise the configuration information or any other information specific to the file. As the file is spread across different SSs, the locations of each chunk are also included in the Metadata.

Distribution Engine 210 is configured to decide the size of chunk the Data Processing Means 130 divides the data into and the Distribution Pattern. The Distribution Pattern provides the info on which chunks should be distributed to which SS. The Distribution Pattern and the number of chunks are sent through the communication channel 302 based on a fixed setting or a storage policy. The storage policy can be pre-set to be fixed or set by the user through USS Interface 110.

The USS Core Memory 230 stores the account information associated with each user. The USS Core Memory 230 may also store the Metadata of each file. As the data is divided into chunks and spread across a plurality of SSs, Metadata of each file need to record the locations of each chunks. The USS Core Memory 230 may also stores storage policies associated with each file or each user. The storage policy can be a fixed set of parameters stored in the USS Core Memory 230. The storage policy can also be set by each user through the USS Interface 110.

The SS Monitor 220 is configured to send requests to different SSs to check the availability, price or both. The SS Monitor 220 in USS Core 200 also can trigger the SS Client 100 to exchange data with different SSs to check the access latency or speed for different chunk sizes. USS Core 200 may schedule periodic or on-demand checking of different SSs. USS Core 200 may also trigger USS Client to send beacon signals to USS Client to obtain the access delay/speed between the USS Client and SSs. The results can be recorded in SS report or any other possible formats and sent to USS Core 200 through Communication Channel 300. The USS Core 200 receives the SS report and stores it in the USS Memory 230.

The communication channel 300 and 302 can be any form of wireless or wired network. In a distributed system, the communication is implemented with messages on some sort of network transport. The communication channel 300 and 302 may employ any type of known data transfer protocol, such as TCP/IP. In one implementation, the communication channels 300 and 302 are the storage network itself. Any suitable technique may be used to translate commands, faults, and responses to network messages. The communication channel 300 and 302 can also be the same. The communication channel 300 can also be any form of channel within any computer device or mobile devices.

FIG. 7 (a) shows the exemplary process 600 of the data processing in the Data Processing Means 130. The data to be stored is firstly received in Step 610. In Step 618, the chunk size is received from USS Core 200. In Step 670, signatures/identifiers are generated for each chunk and/or the whole data. Signatures/Identifiers include a hash value, message digest, checksum, digital fingerprint, digital signature or other sequence of bytes that substantially uniquely identifies the file in the data storage system. For example, signatures/identifiers could be generated using Message Digest Algorithm 5 (MD5) or Secure Hash Algorithm SHA 512. The signatures/identifiers are generated to check the integrity of data retrieved. The signatures/identifiers are also included in the Metadata. In Step 680, the Metadata associated with the corresponding data to be stored is determined. The signature of the Metadata for each file may also be generated in Step 680.

FIG. 7 (b) shows another exemplary process 602 of the data processing in the Data Processing Means 130. This example optimizes the scenario of modifying existing data. The data to be stored is firstly received in Step 610. In Step 622, the process determines whether the data is new data or a modification of existing data. If it is new data, the steps 618, 670 and 680 are the same as those in process 600. Otherwise, the affected chunks due to the modification are located in Step 632 for the case of modifying the existing data. Dynamic warping related algorithms can be applied to find the similarity between the modified data and the original data to finally find all the affected chunks. In Step 642, new chunks are generated for the affected chunks. In Step 652, signatures/identifiers of the new chunks are generated. In step 662, Metadata is updated. One of the most important advantages of employing the process 602 over the process 600 is lower network traffic as only the affected chunks instead of the whole data are uploaded to the SSs. For example, if new data is appended at the end of the original data. Only the newly added chunks and updated Metadata are sent over the network.

Distribution Engine

FIG. 8 is a flow chart illustrating a process 700 performed by the Distribution Engine 210 in accordance with the first embodiment of this invention. In Step 710, process 700 firstly determines the file size as Fsize- In Step 720, process 800 determines the Storage Policy either by a default policy or based on the input from the user through the interface shown in FIG. 6. In Step 730, process 800 further determines the available SSs for the particular user and the available space from each SS. All these info can be retrieved from USS Core Memory 230. In step 740, process 800 further determines the unit cost for each SS, the access latency from the user to each SS if different chunk sizes are used. The unit cost and access latency can also be retrieved from the USS Client Memory 140.

In Step 760, process 700 determines the size of data distributed to each SS. In an example, the size to each SS can be set as fixed default percentages multiplying with the requirement space Fsize- In another example, the size to each SS can be configured based on the Storage Policy specified by the user as shown in (1) below,

Others

PSizej =

Minimize the cost" is selected

Where Availj represents the size of available space for jth SS, Pricej represents the price for the jth SS. It is worth mentioning that for the case when "Minimize the cost" is selected, it may happen that the PSizej is bigger than Availj. For this case, the balance PSizej-Availj is assigned to the SS with the next lowest price. The same rule applies if the SS with the next lowest price does not have enough available space.

In Step 770, process 700 determines the chunk size CSize. In an example, the chunk size CSize can be set as a fixed value or through the following equation (2). PSize- min( max ^L ) when "Minimize the latency" is selected

CSize = <[ !<;<„„ Ki<C Speedy _{( 2 )}

CSize_default Others

Where Speedy represents the speed between the user to the jth SS for the ith chunks size, C represents the total number of potential chunk size candidates. Speedy can be extracted from the SS report gathered through SS Monitor 220.

Lastly, in Step 790, process 700 determines the Distribution Pattern. It is realized by determining the number of chunks to be distributed to each SS. This is done by the following equation (3) based on the size of the data distributed to each SS and the chunk sized decided by Step 770 and 760.

PSizej Fsize

CNurrij = (3)

Fsize CSize

Where [a] represents the ceiling operation to get the smallest integer that is bigger than a, [aj represents the flooring operation to get the biggest integer that is smaller than a. In the practical implementation, there might be surplus chunks due to the ceiling and flooring operations. In this case, the surplus chunks are distributed to the cheapest available SS if "Minimize the cost" is selected. Otherwise, the surplus chunks with potential padding are distributed to the SS has the largest available space.

Once the number of chunks to each SS is decided, the specific chunks distributed to each SS can be decided in a random order, a sequential order or based on a pre-stored order in the USS Core Memory. One example of the Distribution Pattern based on a random order and a sequential order is illustrated in the second and third row of the table below, respectively.

Table 1: Distribution Pattern - Example 1

It is to be appreciated by one skilled in the art that the Steps 710-770 may not be executed in the same order as that shown in FIG. 8.

Interactions between USS Client, USS Core and SSs

The interactions of the USS Core 200 and the USS Client 100 are defined in terms of functions and return values. The interactions of the USS Client 100 and SS1-SS3 310-330 are also defined in terms of functions and return values.

FIG. 9 shows exemplary communication sequences among USS Client 100, USS Core 200 and different SSs 310, 320 and 330 when creation/uploading, data/files is requested. During creation/uploading of a file, the File size 902, and Storage Policy 910 if there is, may be sent to the USS Core 200 in Step 810. Based on the File size 900 and possible Storage Policy 910, the Distribution Engine 210 in the USS Core 200 determines the Chunk Size 928, Distribution Pattern 926 and sends them to the USS Client 100 in Step 812 to the Splitter 132 in Data Processing Means 130. In Step 816, USS Client 100 distributes the chunks to the respective SSs 310-330 based on the Distribution Pattern 926 received in Step 812. At the same time, the Metadata associated with the data is generated and stored in USS client Memory 140. In Step 814, the Metadata 920 is sent to USS Core 200 and stored in USS Core Memory 230. FIG.

10 shows exemplary communication sequences among USS Client 100, USS Core 200 and different SSs 310, 320 and 330, when retrieving data is requested. During data retrieval, the identifiers of the Metadata stored in the USS Client is retrieved to check whether the Metadata stored locally is the same as that stored in USS Core 200 in Step 818. This is because the user may have updated the file somewhere else. If it is the same, the data can be directly retrieved from the corresponding SSs in Step 822. If it is not the same, the Metadata is synchronized in Step 820 before the retrieval of the data chunks in Step 822. FIG.

11 shows exemplary communication sequences among USS Client 100, USS Core 200 and different SSs 310, 320 and 330 when deletion of data is requested. During data deletion, the identifiers of the Metadata stored in the USS Client 100 is retrieved to check whether the Metadata stored locally is the same as that stored in USS Core 200. This is the same as Step 818 shown in FIG. 10. If the hash is the same for the Metadata, deleting commands will be sent to respective SSs 310-330 according to the Metadata to delete the data in Step 824. After deleting the data in the SSs, Metadata is deleted from the local copy in USS Client 100. A delete command is sent to the USS Core 200 to delete the copy as well. If it is not the same, the Metadata is synchronized by retrieving the most updated Metadata in Step 820. Once the Metadata is synchronized, Steps 824 and 826 will be conducted to delete both the data and its associated metadata. When the user uses the USS Client 100 to update data, the interactions between USS Client 100, USS Core 200 and different SSs 310-330 are the same as those during the creation of the data shown in FIG. 9. To enhance efficiency, it's possible that only additional or modified data is distributed to SSs 310-330 in Step 816 supported by the process 602 illustrated in FIG. 7(b). FIG. 12 shows exemplary communication sequences among USS Client 100, USS Core 200 and different SSs 310, 320 and 330 when the USS Core 200 trigger the USS Client to check the status of the subscribed SSs. In Step 828, USS Core 200 sends command to USS Client 100 to trigger the monitoring of the status of the subscribed SSs. In Step 834, USS Client may send beacon signals or ping the different SSs to generate SS report 930 with any format including plain text, work or excel and etc. The report may capture the latencies between the client and each SS for different chunk sizes, the availability of each SSs. The SS report is sent to USS Core 200 in Step 830.

In another embodiment shown in FIG. 2, the Data Processing Means 130 may further comprise Encryptor 134, Redundant Means 136, or both besides the Splitter 132 and Signature Generator 138 shown in the first embodiment. If the Data Processing Means 130 has Redundant Means 136, the USS Core 200 further comprises an additional Repair Engine 240.

The Encryptor 134 generates a key to be used in encryption. The Encryptor 134 may use standard Data Encryption Standard (DES) or Advanced Encryption Standard (AES) encryption schemes with varying key strengths. The key strengths can be a fixed one or configured by the user. The Data Processing Means 130 may also include redundant means 136 as shown in FIG. 2 (b). Redundant means 136 is responsible to add redundancy for the data stored. Redundancy can be added through making multiple copies (i.e. replications) or erasure codes. Erasure codes applied can be any block codes such as Reed-Solomon codes, Hamming codes or Low Density Parity Check Code (LDPC) or convolutional codes, or Turbo codes. FIG. 15 shows an exemplary case of adding redundancy through using (6, 3) erasure code. The data to be stored 500 is firstly divided into 6 data chunks 510 and 3 redundant chunks 520 are added through encoding process of erasure codes. FIG. 15 shows a Distribution Pattern that the first 3 data chunks 530 are distributed to SS2 320, the second 3 data chunks 540 are distributed to SS3 330 and the 3 redundant chunks 520 are distributed to SSI 310. The corresponding Distribution Pattern can be described by {SS2: Dl, D2, D3; SS3: D4, D5, D6; SSI: PI, P2, P3}. By distributing the data chunks and the added redundant chunks to different SSs, the data is still able to be recovered even a number of SSs become not available or failed.

In some implementations according to the second embodiment, a Storage Policy 420 in FIG. 13 may comprise a Cost Policy 428, Access Latency Policy 422, Reliability Policy 424 and Security Policy 426. It is to be appreciated by one skilled in the art that policies 422-428 are not necessary to be all present. The cost Policy 428 and the Access Latency Policy 422 are the same as described in the first embodiment. A Reliability Policy 424 is a set of preferences, priorities, rules and/or criteria that specify the requirements of reliability. In some implementations, a Storage Policy 420 may comprise a Security Policy 426. A security policy is a set of preferences, priorities, rules and/or criteria that specify the requirements of security. For example, which encryption scheme shall be applied can be specified together with the length of the key. FIG. 14 shows an example of the Ul to "Set a Storage Policy" according to the second embodiment of the invention. The Ul to "Set a Storage Policy" can be triggered by clicking the USS Interface example shown in FIG. 4. Minimizing access latency can be enabled by checking the corresponding option. The number of failed SSs that the USS services can tolerate without losing data can also be specified. Different options such as minimizing space, balanced or replication-based methods can also be selected based on the requirements. Minimizing cost option can be enabled to minimize the cost. It is to be appreciated by one skilled in the art that the interface for accessing the Storage Policy 420 may be in the form of APIs.

Repair Engine 240 is configured to engage a reconstruction process to recover any lost/unavailable chunks. In the event of failure/outage of any SS, Repair engine 240 is triggered to start the reconstruction of the lost/unavailable chunks. It is possible to conduct the reconstruction in the USS Client 100 to achieve even higher security by avoiding USS Core having the access right for different SSs. FIG. 16 shows an exemplary case of a repairing process using the same erasure code as that in FIG. 15. When SS3 is detected to become unavailable as shown in the SS Monitor process in FIG. 15, repairing (i.e. decoding) process is triggered to gather required data and redundant chunks to recover the lost/unavailable data chunks 740 and store them in an available SS4 340. Depending on the property of different erasure codes, the number of chunks required may be fewer. After the repair process, updated Metadata is sent from USS Core 200 to USS Client 100.

FIG. 17 shows the exemplary process 800 of the data processing in the Data Processing Means 130 from USS Client 100. It should be noted that the Steps 610, 618, 670 are the same as described in the first embodiment of the invention. In Steps 620-630, the Code Configuration and Encryption Configuration are received from USS Core 200. In Step 640, the file is divided into a plurality of chunks based on the Code Configuration. In Step 650, redundant chunks are generated through erasure codes or replications based on the Configuration received in Step 620. In Step 660, encryption is done on each of the chunk based on the Encryption Configuration received in Step 630. In Step 670, the signatures associated with each chunks and the whole data are generated. Step 680 is to generate the associated Metadata and its signature. It should be noted that the Metadata generated may also comprise Code Configuration, Encryption Configuration or both. This means that the Metadata stored in USS Client Memory 140 and USS Core Memory 230 also may also comprise Code Configuration, Encryption Configuration or both. It is to be appreciated by one skilled in the art that the Steps 620-660 may not be all present and the execution order may not be exactly the same as that shown in FIG. 17. For example, the order of Step 620 and 630 are exchangeable. Step 660 may be executed before Step 640. In another example, Steps 630 and 660 may not be present if encryption is not required.

FIG. 18 is a flow chart illustrating a process 900 performed by the Distribution Engine 210 in accordance with the second embodiment of this invention. It should be noted that Steps 710-740 are the same as those described in the previous embodiments of this invention. Then, the redundant rate is determined in Step 750. If the redundant rate is r, the total space T required for a file with size Fsize is defined in the following equation (4),

T = r F_size (4)

The steps of determining the redundant rate is described further below. In an example, the redundant rate r can be set to a fixed value that is no less than 1. In another example, the redundant rate can be configured based on the user's requirement specified in the Storage Policy. As shown in FIG. 14, the user can set the Reliability Policy as "Minimize Space" to save the total space with an expected reliability level expressed as the maximum number of SSs' failure/unavailable being able to tolerate. If the maximum number of SSs' failure/unavailable being able to tolerate is set to f_max, the redundant rate can be determined according to the following equation (5),

! min(r ) When "Minimize Space " is selected,

median (ry) When "Balanced " is selected. (5) f_max + 1 When "Replication" is selected, where it is assumed that there are a range of candidates of redundancy η and J represents the indices satsifying η

nss

≥ and n_ssrepresents the total number of available SSs for the user. nSS ~ fmax

In Step 760, process 700 determines the size of data distributed to each SS. In an example, the size to each SS can be set as fixed default percentages multiplying with the requirement space T. In another example, the size to each SS can be configured based on the Storage Policy specified by the user as shown in (6) below,

Avail _j

T -n^— -— Others

Availj

PSizei = { (6)

¹ ' l/(∑"ff Price j) '

T When "Minimize the cost" is selected

Price j

Where Availj represents the size of available space for jth SS, Pricej represents the price for the jth SS. It is worth mentioning that for the case when "Minimize the cost" is selected, it may happen that the PSizej is bigger than Availj. Then the balance PSizej-Availj is assigned to the SS with the next lowest price. The same rule applies if the SS with the next lowest price does not have enough available space.

In Step 770, process 700 determines the chunk size CSize. In an example, the chunk size

In Step 780, process 700 determines the code configuration through the following equation

(8), code_Config =

(k, n)erasure Code, where k = ceil ( -^- ) , n = rk. When "Minimize Space "

VCSize/

/ "Balanced "is selected

r When "Replication" is selected.

In practical implementation, the calculated k may not be available, the nearest possible candidate k may be selected instead. Lastly, in Step 790, process 700 determines the Distribution Pattern. The pattern can be decided run-time or be loaded based on the configuration pre-stored in the USS Core Memory. The Distribution Pattern is different for each selected erasure codes. E.g. if a (8, 6) -S code with r = 1.33 is applied and n≤s = 4. The part size is the same for each SS. Table 2 shows the examples of Distribution Pattern when the user requests for tolerating failures of any 1 out of 4 SSs or any 2 out of 8 SSs. One example of the Distribution Pattern can be illustrated in the third column of the second row to tolerate the failure of any 1 out of 4 SSs. Apparently, the Distribution Pattern shown in the second row in Table 2 satisfies the reliability requirement of tolerating failures of any 1 out of 4 SSs or any 2 out of 8 SSs.

Table 2: Distribution Pattern - Example 2

Interactions between USS Client, USS Core and SSs

FIG. 19 shows exemplary communication sequences among USS Client 100, USS Core 200 and different SSs 310, 320 and 330 when creation/uploading, data/files is requested. During creation/uploading of a file, the File size 822, and Storage Policy 810 if there is, may send to the USS Core 200 in Step 810. Based on the File size 902 and possible Storage Policy 910, the Distribution Engine in the USS Core 200 determines the Code Configuration 922, Encryption Configuration 924, Distribution Pattern 926, Chunk Size 928 and sends the info 922-928 to the USS Client 100 in Step 812 to start the encoding and encryption in Data Processing Means 130 when necessary. In Step 816, USS Client 100 distributes the chunks to the respective SSs 310-330 based on the Distribution Pattern 926 received in Step 812. At the same time, the Metadata associated with the data is generated and stored in USS client Memory 140. In Step 814, the Metadata is sent to USS Core 200 and stored in USS Core Memory 230. It should be noted that the exemplary communication sequences among USS Client 100, USS Core 200 and different SSs 310, 320 and 330 when the retrieving and deleting data are the same as those described in the first embodiment of the invention.

FIG. 20 shows exemplary communication sequences among USS Client 100, USS Core 200 and different SSs 310, 320 and 330 when the USS Core 200 trigger the USS Client 100 to check the status of the subscribed SSs. It should be noted that the Steps 828-830 are the same as those described in the previous embodiments of this invention. In the event of failure/unavailability from any SS, the repair engine 240 in USS Core 200 will be triggered to recover the affected data stored in the failed/unavailable SS. In step 832, new MetaData will be generated and sent to USS Client 100.

Claims

1. A system for universal storage services (USS), comprising: a storage service controller configured to allow a plurality of storage services to be registered, and allow the registered storage services to be de-registered; a USS interface configured to allow a file or a data object to be uploaded, downloaded or deleted to/from a unified space where details of the registered storage services are hidden and the unified space has an aggregated capacity obtained from the registered storage services; a distribution engine configured to determine a chunk size and a distribution pattern which comprises information of part of the file or the data object to be distributed to any one of the registered storage services, whereby a single registered storage service cannot re-assemble the file or the data object back; a data processing means configured to slice the file or the data object into a plurality of chunks with the determined chunk size; and a distributor configured to distribute the chunks of the file or the data object based on the determined distribution pattern, to the registered storage services.

2. The system according to claim 1, wherein the storage services comprise at least two of the following services:

at least one cloud storage service provided by different cloud service providers, and/or at least one cloud storage service provided by a same cloud service provider but with different geographical locations or different registration accounts, and/or

at least one private cloud storage service, and/or

at least one local storage service.

3. The system according to claim 2, further comprising:

a storage service monitor configured to check availability, and/or the unit costs, and/or available space, and/or access latencies of the registered storage services.

4. The system according to claim 3, wherein the USS interface further comprises:

a storage policy including any one or any combination of the following policies:

a cost policy configured to allow a set of preferences, priorities, rules and/or criteria to be specified for requirements of cost;

an access latency policy configured to allow a set of preferences, priorities, rules and/or criteria to be specified for requirements of access latencies;

a security policy configured to allow a set of preferences, priorities, rules and/or criteria to be specified for requirements of security; and

a reliability policy configured to allow a set of preferences, priorities, rules and/or criteria to be specified for requirements of reliability.

5. The system according to claim 4, wherein the distribution engine further comprises:

a controller configured to determine the chunk size and the distribution pattern based on any one or any combination of the following:

a size of the file or the data object; unit costs of the registered storage services;

availability of the registered storage services;

access latencies of the registered storage services; and

the storage policy.

6. The system according to claim 5, further comprising:

a redundant means configured to generate a plurality of redundant chunks based on the plurality of chunks of the file or the data object and an erasure code, wherein the number of the redundant chunks is greater than the number of the chunks of the file or the data object, and not less than the number of the storage services; and

a repair engine configured to recover a lost chunk when any of the registered storage services is unavailable.

7. The system according to claim 6, wherein the distribution engine is further configured to determine a code configuration of the erasure code based on the storage policy.

8. The system according to claim 7, wherein the code configuration further comprises:

a redundant rate of the erasure code and a dimension of the erasure code.

9. The system according to claim 8, wherein the erasure code comprises:

a non-MDS code including a Hamming code or a Low Density Parity Check Code (LDPC) or a convolutional code, or a Turbo code.

10. The system according to claim 5 or claim 9, wherein the data processing means further comprises:

an ecryptor configured to encrypt the chunks.

11. The system according to claim 10 wherein the distribution engine is further configured to determine an encryption configuration based on the storage policy.

12. A system with client-server architecture for universal storage services (USS), comprising: at least one client, and at least one server,

wherein each of the at least one client comprises:

a storage service controller configured to allow a plurality of storage services to be registered, and allow the registered storage services to be de-registered;

a USS interface configured to allow a file or a data object to be uploaded, downloaded or deleted to/from a unified space where details of the registered storage services are hidden and the unified space has an aggregated capacity obtained from the registered storage services;

a data processing means configured to slice the file or the data object into a plurality of chunks with the determined chunk size; and

a distributor configured to distribute the chunks of the file or the data object based on the determined distribution pattern, to the registered storage services; and

wherein each of the at least one server comprises:

a distribution engine configured to determine a chunk size and a distribution pattern which comprises information of part of the file or the data object to be distributed to any one of the registered storage services, whereby a single registered storage services cannot re-assemble the file or the data object back.

13. The system according to claim 12, wherein the storage services comprise at least two of the following services:

at least one private cloud storage service, and/or

at least one local storage service.

14. The system according to claim 13, further comprising:

a storage service monitor configured to check availability, and/or the unit costs, and/or available space, and/or access latencies to the registered storage services.

15. The system according to claim 14, wherein the USS interface further comprises:

an access latency policy configured to allow a set of preferences, priorities, rules and/or criteria to be specified for requirements of access latency;

16. The system according to claim 15, wherein the distribution engine further comprises:

a size of the file or the data object;

unit costs of the registered storage services;

availability of the registered storage services;

access latencies of the registered storage services; and

the storage policy.

17. The system according to claim 16, further comprising:

18. The system according to claim 17, wherein the distribution engine is further configured to determine a code configuration of the erasure code based on the storage policy.

19. The system according to claim 18, wherein the code configuration further comprises:

a redundant rate of the erasure code and a dimension of the erasure code.

20. The system according to claim 19, wherein the erasure code comprise:

a non-MDS codes including a Hamming code or a Low Density Parity Check Code (LDPC) or a convolutional code, or a Turbo codes.

21. The system according to claim 16 or claim 20, wherein the data processing means further comprises:

an ecryptor configured to encrypt the chunks.

22. The system according to claim 21 wherein the distribution engine is further configured to determine an encryption configuration based on the storage policy.

23. A method for universal storage services (USS), comprising:

registering, by a storage service controller, a plurality of storage services;

conducting, by a USS interface, file operations on a file or a data object to/from a unified space where details of the registered storage services are hidden and the unified space has an aggregated capacity obtained from the registered storage services;

determining a chunk size and a distribution pattern which comprises the info of at least part of the file or data object to be distributed to any one of the registered storage services, whereby a single registered storage service cannot re-assemble the file or the data object back; slicing the file or the data object into a plurality of chunks; and

distributing the plurality of chunks based on the determined distribution pattern, to the registered storage services.

24. The method according to claim 23, wherein the storage services comprise at least two of the following services:

at least one private cloud storage service, and/or

at least one local storage service.

25. The method according to claim 24, further comprising:

checking, by a storage service monitor, availability, and/or unit costs, and/or available space, and/or access latencies of the registered storage services.

26. The method according to claim 25, wherein the USS interface further comprises:

27. The method according to claim 26, wherein the chunk size and the distribution pattern are determined based on any one or any combination of the following:

a size of the file;

unit costs of the registered storage services;

availability of the registered storage services;

access latencies of the registered storage services; and

the storage policy.

28. The method according to claim 27, further comprising:

generating a plurality of redundant chunks based on the plurality of chunks of the file or the data object and an erasure code, wherein the number of the redundant chunks is greater than the number of the chunks of the file or the data object, and not less than the number of the storage services; and

recovering a lost chunk when any of the registered storage services is unavailable.

29. The method according to claim 28, further comprising:

determining a code configuration of the erasure code based on the storage policy.

30. The method according to claim 29, wherein the code configuration further comprises: a redundant rate of the erasure code and a dimension of the erasure code.

31. The method according to claim 30, wherein the erasure code comprises:

a non-MDS codes including a Hamming code or a Low Density Parity Check Code (LDPC), or a convolutional code, or a Turbo code.

32. The method according to claim 27 or claim 31, further comprising:

encrypting the chunks.

33. The method according to claim 32, further comprising:

determining an encryption configuration based on the storage policy.