WO2020233765A2

WO2020233765A2 - Distributed storage system in fog computing

Info

Publication number: WO2020233765A2
Application number: PCT/DZ2020/050007
Authority: WO
Inventors: Amir DJOUAMA; Meriem Affaf BAZA; Yacine BAZA
Original assignee: Djouama Amir; Baza Meriem Affaf; Baza Yacine
Priority date: 2019-05-22
Filing date: 2020-05-18
Publication date: 2020-11-26
Also published as: WO2020233765A3

Abstract

The fog infrastructure was introduced as a new model to facilitate the transfer of data wirelessly to peripherals distributed in the Internet of Things (IoT) network paradigm. It represents a platform that is intermediate between the cloud and connected things, and it provides processing, storage and network services. The invention pertains to one of the most important services in fog computing, namely storage, and it consists in producing a fog-distributed storage system. Each node, once it has joined a network, will have to free up a portion of its storage capacity. This freed-up capacity will be added to other spaces freed up by the other nodes. This additional space will be a storage space common to all of the participating nodes. The invention also allows the use of this space by end users through well-defined mechanisms.

Description

Distributed storage system in Fog Computing

The present invention relates to the realization of a distributed storage system in Fog Computing. The notion of storage is a strategic datum in the digital field and a key element in Fog Computing. This storage system allows, through the use of peripherals (computers, mobiles and other smart devices) to store data as close as possible to end users and those by exploiting a common storage space for the nodes of the system.

Data storage has evolved over the years to meet the needs of businesses and individuals. We are now coming to a tipping point where the traditional approach to storage no longer works, for both technical and economic reasons. So we need a new approach, a new concept of data storage. Currently, the best approach to meeting today's demands for data storage appears to be distributed storage.

Distributed storage stores data on a multitude of standard servers, which behave like a single storage system, it can relate to one of three types of storage: block, file and object.

Object storage is used to manage very large volumes of unstructured data. The system is based on objects and metadata, so each file corresponds to an object with an associated path (an address) and a unique identifier.

An object storage system must verify several characteristics or properties to ensure good quality of service. These properties are:

Connectivity: the storage system must ensure operational continuity when a site is disconnected [Patcit1]

Data locality: the idea is to promote local access whenever possible, and to minimize data access time [Patcit2]

Network confinement between sites: this is the fact that an action (write or read) performed on one site has no negative impact on other sites [Patcit3]

Support for mobility: this is the fact that the storage service should be able to move user data transparently from the previous site to a closer site [Patcit4]

Scaling or scalability: this consists of having a system that supports both a very large number of Fog sites, a large number of clients per site and a large number of stored objects [Patcit5]

Among the different solutions available in object storage, Cassandra and IPFS were selected with the Scale-Out NAS system, because their designs do not require a centralized server and they are similar to the proposed solution.

Cassandra is a distributed storage system for managing very large amounts of structured data distributed over many servers, it is based on a single hop distributed hash table (DHT), the key space is divided into distributed ranges between the nodes making up the system. A gossip protocol is used to distribute the system topology [Patcit6]

The IPFS (InterPlanetary File System) is an object mode storage solution based on the BitTorrent and Kademlia DHT protocol, it has been coupled with the SCALE-OUT NAS system in order to remedy the fact that it alone does not take advantage of the physical topology of the Fog infrastructure. The Scale-Out NAS system is defined by the Scale-out storage which is a storage architecture attached to the NAS network, it is a flexible storage system which allows to adapt dynamically to the need [Patcit7]

A Distributed File System (DFS) is a system that manages files and folders on multiple computers. The data is stored on a server and is accessed and treated as if it were stored on the local client computer, however, the servers have full control over the data and provide access control to the clients. DFS facilitates the sharing of information and files among users of a network in a controlled and authorized manner. This system must be transparent, fault tolerant and scalable.

There are a multitude of distributed file systems like HDFS which is a central component of the Apache Framework and more precisely of its storage system, it is capable of managing thousands of nodes without operator intervention, HDFS is a centralized distributed files, it is based on a “master / slave HDFS architecture”, the metadata is managed by a single server called the Namenode where the data is divided into blocks [the article “Analysis of Six Distributed File Systems” by Benjamin Depardon, Gael Le Mahec and Cyril S_eguin]. Also, we cite the Ceph file system (CephFS) which is a fully distributed file system compatible with POSIX standards which uses a Ceph storage cluster to store its data.

Finally, Block-level storage is one of the most popular storage system technologies. In this storage mode, raw volumes of data are created and each block can be controlled as an individual hard drive. These blocks are controlled by server-based operating systems and each block can be individually formatted with the required file system. A block is a sequence of bytes. Block-based storage interfaces are the most common way to store data with rotating media such as hard drives, CDs, floppy disks, and even traditional 9-track tapes.

We have several storage systems in block mode, including Ceph Block. This device provides reliable, distributed and high performance block storage drives to customers, it stores a block of data in striped sequential form on multiple OSDs in a Ceph cluster.

Through the established study, we have come to the conclusion that the three types of distributed storage system (object, file, block) are complementary and each of them deals with a specific level of abstraction. But the downside of most of these solutions is that the data is stored in external data servers which can result in a huge investment, as well as possible security risks and malicious use of the data. For this our solution uses already existing peripherals and devices internal to the organization, the idea is to use a common storage space for the nodes of the system. This space represents the sum of the storage capacities released by each node belonging to the network. The fact that the data is stored inside the organization, this allows it to acquire better security of its data.

The Fog infrastructure was introduced as a new model to facilitate wireless data transfer to distributed devices in the Internet of Things (IoT) network paradigm. It represents an intermediate platform between the Cloud and connected objects, it provides processing, storage and network services.

The invention deals with one of the most important services in Fog, it is storage, it is the realization of a distributed storage system in Fog Computing.

Each node, once joined a network, will have to free up part of its storage capacity. This freed capacity will be added to the other spaces freed by the other nodes. This additional space will be a common storage space for all participating nodes. The invention also allows the exploitation of this space by end users through well-defined mechanisms.

The essential aim of this invention is to achieve a distributed system which allows data storage as close as possible to users based on the Fog infrastructure. Since the Fog architecture is an intermediate infrastructure between the Cloud and connected objects, it makes it possible to respond to one of the disadvantages of the Cloud which consists in storing data in external and remote structures, which can lead to a possible risk of theft and malicious use. The Fog has a decentralized architecture offering processing, storage and network services.

Also the objective of this invention is to have a common storage space using the unused storage spaces of the peripherals belonging to a given network, this makes it possible to use the peripherals and the already existing tools and to avoid investing in the purchase or allocation of new data servers.

Our approach is based on a hybrid P2P architecture, it allows us to take advantage of the advantages of centralized and decentralized systems, this grants us the maintenance of a certain scalability and the bypassing of the problems of fully centralized or decentralized systems.

Figure 2 illustrates the architecture of our system as a whole, it is made up of several local networks (2.D) linked together to form a global network. In this architecture, we are based on the proactive protocol, so that we have a dynamic routing table distributed in the network, it encompasses all the devices that are in the system. Each Fog node (2.C) communicates with the other Fog nodes of the system; at the level of each of them we find a routing table as mentioned above. If the network topology changes, the Fog node sends a broadcast message to the entire network.

Each local network (2.D) consists of several active (2.B) and passive (2.A) nodes linked together and managed by a more powerful IT unit selected to act as a Fog server (2.C) to serve others.

The advantage of realizing such a distributed storage system is:

Save, recover and / or delete a file in the system.

Ensure the extensibility and scalability of the architecture, that is, that there could be a possibility of adding or removing new devices (nodes);

Guarantee the security and confidentiality of data, the system must be highly secure because the information must not be accessible by everyone;

Guaranteeing data integrity is the confirmation that the data that has been sent, received or stored is complete and not modified;

Take into consideration the consistency of the data, that is, the system must ensure that the copies are automatically synchronized with the original file;

Ensuring data availability, the system must ensure that the user can view or retrieve their data at any time.

To ensure uniformity of data distribution, the system must ensure that user data will be distributed in a balanced and uniform manner.

Take into consideration the transparency of the system, that is, details about storage information and complex system mechanisms will be hidden from the user while still allowing them to manipulate these files.

The description of the invention is detailed below with 4 supporting figures:

Fig. 1

describes the general principle of the invention where each peripheral will have to free a percentage of its storage capacity in order to have a common storage space in the network.

A: Peripherals;

B: Releases x% of storage;

C: Total storage space.

Fig. 2

is a representation of the broad architecture of the invention.

A: Passive devices;

B: Active peripherals;

C: Fog node;

D: Local network.

Fig. 3

represents the general principle of the distribution of a file in a network.

A: Customer device;

B: Fog node;

C: Peripheral storing block number i;

1: Send the file size;

2: Return the devices that can store the file;

3: Distributed the file.

Fig. 4

represents the general principle of dividing a file into a set of blocks.

A: Initial file;

B: Block number i;

C: Peripheral storing block number i;

1: Divide the initial file into a set of blocks;

2: Send the blocks to the selected devices.

The proposed solution allows each node in our architecture to free up storage space. Thus, we will have a global storage space, including all the freed spaces, that can be used by an end user in a transparent way. This principle is represented by figure 1, where each peripheral (1.A) will have to free a percentage of its storage capacity (1.B) to have at the end a storage space common to all the peripherals belonging to the network. (1 C).

For a better use of this system, we will use at least three types of devices with different roles and specifications. These three types are defined as follows:

Fog Node (Fog Server): It is a powerful and flexible intelligent computing unit managing three-dimensional resources, with on-board storage, compute and communication capacities. He is responsible for network management, security, storage and metadata.

Passive Device: these are devices that free up large storage spaces in the system, their only role is to store data blocks.

Active Device: These are devices that usually free up small storage space in the system, but their main role is file processing, that is, from what type of device a user can access to the system.

The system can however be functional with only the two types of devices Fog and active with the condition that the active device frees more storage space.

The system also specifies how to distribute blocks of data in a Fog network consisting of multiple nodes (devices), it implements a block recovery mechanism, as well as a copy synchronization mechanism.

Figure 3 shows the general principle of the distribution of the data blocks of a file, the client device (3.A) loads the file and sends its size (3.1) to the Fog node (3.B), which ci selects and returns the list of available devices that can store the file (3.2), the client device (3.A) relies on the list of devices received to divide the file into blocks of size proportional to the space of free storage of the chosen peripherals and a predefined threshold, finally, the customer's peripheral distributes the resulting blocks (3.3) to the chosen peripherals (3.C).

The threshold is a value associated with each type of device (active or passive). It defines a storage rate not to exceed free storage space for each distribution operation, if we take for example a device with 10 GB of free space and a threshold of 10%, the value of the space of storage that we can use for a distribution operation is 1 GB.

We use this approach in order to prevent our system from becoming unbalanced, that is, having at one time T some saturated devices and others totally empty, which could cause uneven distribution.

In order to improve the availability of user data, the use of replications is necessary to meet this criterion, for this we have set up a matrix grouping together all the devices that store the data blocks and their data. copies. Each row in the matrix represents a list of target devices containing the data blocks of the same copy and the items that are in the same column contain the same blocks, that is, the original block and its copies.

To ensure the consistency of the file with its copies, we have implemented a synchronization process which allows, in the event of modification of the original file, to update its copies; the system allows reuse of devices storing the old version of the file. So from the new size of the file, it can know whether the generation of additional devices is needed or not.

Retrieving a file exploits the copy matrix. Thus, in order to recover all the blocks of a file and to reconstruct it, the system traverses the first row of the matrix while recovering the blocks, if one of the blocks of the file is on a faulty device, the latter will access to its copy which is on the same column. One of the main features of this system is the fact that it manages device failures.

Patent documents

The article "Which storage system for Fog architectures? »By B.Confais, A.Lèbre and B.Parrein.

The article "Performance Analysis of Object Store Systems in a Fog / Edge Computing Infrastructures" by B.Confais, A.Lèbre and B.Parrein.

The article "Performance Analysis of Object Store Systems in a Fog and Edge Computing Infrastructures" by B.Confais, A.Lèbre and B.Parrein.

The article "An object store for Fog infrastructures based on IPFS and a Scale-Out NAS" by B.Confais, A.Lèbre and B.Parrein.

Claims

A system according to the invention characterized in that this new generation of storage system distributed in a network - through the use of simple user machines, allows the distribution of blocks of files directly into the hard drives of the nodes. The system according to the invention characterized in that the source files are divided into secure blocks, according to the equipment connected to the network and according to the storage space available in each of the nodes, then distributed in the network, using the Fog nodes as a distributor element. The system according to the invention is characterized in that the recovery of the blocks is done using the Fog nodes. Therefore, this solution allows a significant gain in storage in the system, better efficiency at lower cost. The use of a set of heterogeneous nodes distributed in the network allows scalability of the system, independence from other providers. Moreover, the presence of data within the internal network enhances the availability and security of data.
A system according to claim 1 characterized in that a node is a device capable of connecting to a network. These nodes are provided with storage space.
A system according to claims 1 and 2 characterized in that a file is of any type having a random and unlimited size. These files are available at the nodes connected to the network.
A system according to claims 1 to 3 characterized in that the storage disk is internal or external linked to the node and containing the various files stored or to be distributed. A percentage P of the available storage space is reserved for operation.
A system according to claims 1 to 4 characterized in that a block 4.B is a file resulting from a division of an initial file 4.A intended to be saved at the nodes of the network 4.C.
A system according to claims 1 to 5 characterized in that the saved blocks are encrypted according to an identified mechanism in order to improve security.
A system according to claims 1 to 6 characterized in that the Fog node is a node connected to the network. This node has the particularity of calculating the sizes of the blocks of the file to be stored. The Fog node selects the nodes that will store the blocks and subsequently stores their addresses. Furthermore, a Fog node brings together a set of network nodes identified by a known distance.
A system according to claims 1 to 7 characterized in that the division of the file takes place at the initiator node and is calculated at the Fog node according to a defined threshold of the available storage space of the disks of the nodes. The division of the source file into 4.1 blocks is done according to the sizes of the blocks calculated by the Fog node.
A system according to claims 1 to 8 characterized in that the distribution is done by the initiating node, after dividing the file into blocks 4.1, and then sent to the selected destination nodes 4.2.
A system according to claims 1 to 9 characterized in that a set C of copies is created and distributed to the nodes in order to ensure the availability of data.
A system according to claims 1 to 10 characterized in that the recovery of the blocks is done by means of the Fog nodes, where the locations of the blocks are recorded. This information is sent to the initiator nodes in order to directly query the backup nodes where the blocks are stored.