EP2212791A2 - Système informatique amélioré comprenant plusieurs noeuds en réseau - Google Patents
Système informatique amélioré comprenant plusieurs noeuds en réseauInfo
- Publication number
- EP2212791A2 EP2212791A2 EP08837674A EP08837674A EP2212791A2 EP 2212791 A2 EP2212791 A2 EP 2212791A2 EP 08837674 A EP08837674 A EP 08837674A EP 08837674 A EP08837674 A EP 08837674A EP 2212791 A2 EP2212791 A2 EP 2212791A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- index
- storage
- addresses
- virtual
- storage unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2053—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
- G06F11/2056—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
- G06F11/2061—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring combined with de-clustering of data
Definitions
- Improved computer system comprising a plurality of networked nodes
- the invention relates to computer systems comprising several computer stations called nodes interconnected in a network.
- Modern networks include user stations that are connected to one or more servers and can share applications and / or storage spaces locally or remotely.
- the invention improves the situation.
- the invention proposes a computer data storage tool comprising a correspondence module connected to storage units, said correspondence module comprising a correspondence function for determining at least a first and a second storage address from an incoming virtual address.
- the correspondence module maintains a first table comprising data for identification of failed storage units, as well as a second table comprising data for modifying virtual address blocks, and the computer tool. includes a recovery unit arranged, at the recovery of a failed storage unit, for updating the storage addresses of this storage unit by calling the correspondence module with virtual addresses derived from the second table, on the database of the first table.
- FIG. 1 shows a general functional view of a computer system according to the invention
- FIG. 2 shows an example of a logical implementation of the system of FIG. 1;
- FIG. 3 shows an exemplary composition of an element of FIG.
- FIG. 4 shows a method of accessing a file in the system of FIG. 1,
- FIG. 5 shows an exemplary implementation of an element of FIG. 3,
- FIG. 6 shows a correspondence between logical spaces and physical spaces managed by the element of FIG. 5,
- FIG. 7 shows an example of a function implemented by the element of FIG. 5 to establish the correspondence of FIG. 6,
- FIG. 8 shows an exemplary implementation of a part of FIG. 7,
- FIGS. 9 and 10 show examples of functions running in parallel with the function of FIG. 7,
- FIG. 11 shows an allocation of the logical spaces over the physical spaces as a variant of the correspondence represented in FIG. 6;
- FIG. 12 shows a variant of the function of FIG. 8 adapted to take account of the allocation of FIG. 11,
- FIG. 13 shows an exemplary implementation of a part of FIG. 12,
- FIG. 14 shows an example of a function running in parallel with the function of FIG. 7, and
- FIG. 15 shows a variant of FIGS. 8 and 12 which implements both the assignment shown in FIG. 6 and that shown in FIG. 11.
- FIG. 1 represents a general diagram of a computer system according to the invention.
- an application environment 2 has access to a file system manager 4.
- a virtualization layer 6 establishes the correspondence between the file system manager 4 and storage servers 8.
- FIG. 2 represents a logical implementation of the system of FIG. 1.
- a set of stations 10, also referred to herein as nodes are interconnected in a network of which they constitute the physical and application resources.
- the network consists of 5 stations, denoted Ni with i varying between 1 and 5.
- the application environment 2 is made of a distributed application layer 12 on the N1, N2 and N3, in one application layer 14 on the N4 and an application layer 16 on the N5.
- the file system manager 4 is produced in a distributed file system 18, and two non-distributed file systems 20 and 22.
- the system 18 is distributed over the N1, N2 and N3 and defines all the files accessible from the distributed application layer 12.
- the file systems 20 and 22 respectively define the set of files accessible from the application layers 14 and 16.
- the files designated by the file systems 18, 20 and 22 are stored in a virtual storage space 24 which is distributed over the set of Ni with i varying between 1 and 5.
- the virtual storage space 24 is here divided into a shared logical space 26, and two private logical spaces 28 and 30.
- the shared logical space 26 corresponds to the space accessible from the distributed application layer 12 by means of the distributed file system 18, and the private logical spaces 28 and 30 to the space accessible from the application layers 14 and 16 by means of the file systems 20 and 22.
- the logical space 26 is distributed over the N1, N2 and N3, the private logical space 28 on the N3 and N4, and the private logical space 30 on the N5.
- an application of the layer 12 "sees” the data stored in the logical space 26 (respectively 28, 30) by means of the file system 18 (respectively 20, 22), although these they are not necessarily physically present on one of the storage disks of the station 10 that uses this application.
- the spaces 26, 28 and 30 are purely logical, that is, they do not directly represent physical storage spaces.
- Logical spaces are mapped using virtual addresses that are referenced or contained in file systems 18, 20, and 22.
- the correspondence module contains a table of correspondence between the virtual addresses of the data in the logical spaces and physical addresses that designate the physical storage spaces in which these data are actually stored.
- each station is used for both the application layer and the storage layer.
- This multifunctionality makes it possible to use the free space on all the stations of the network, rather than leaving this space unoccupied.
- any station can play an application node role, a storage node role, or both these roles at once.
- All the application, storage and file system resources can be integrated locally on each station, or distributed on the stations of the network.
- FIG. 3 represents an exemplary architecture of a station 10 of FIG. 2.
- the station represented in this example can represent one of the stations N1, N2 or N3.
- Station Nx individually has a structure similar to that of the global structure shown in Figure 1. It thus comprises an application layer 32, a file system 34, a virtualization layer 36 and a storage space 38 in the form of a local memory with direct access.
- the virtualization layer 36 comprises a motor 40 and a correspondence table 42.
- the direct access to the storage space 38 is managed by a storage client 44 and a storage server 46. The roles and operations of these elements will be specified below.
- the example described here represents an improved embodiment of the invention, in which all the resources, both application and storage, are distributed over the network.
- management module 48 It is the same for the virtualization layer 36, the storage client 44 and the storage server 46.
- the distribution of these elements is managed by means of a management module 48.
- the administration module 48 is mainly used during the creation and updating of the logical spaces.
- the administration module 48 calls the virtualization layer 36 to create the correspondence table between each virtual address of the logical space and a physical address on a given storage node.
- the look-up table 42 is a table that contains information for retrieving matches.
- the engine 40 interacts with the table 42 to establish the corresponding physical address.
- the lookup table 42 does not contain all the matches, but only one set much smaller information, sufficient to restore correspondence very quickly.
- Figure 4 shows a method implemented by the system to access a file.
- the access to a file by an application of the application layer of a given node is initialized by a file access request 50.
- the file access request 50 comprises: an identifier of the file concerned for the file system and an address in this file,
- the size of the request that is to say the number of bits to be accessed after the address of the targeted file
- the file system determines one or more virtual addresses for the data of this file, and generates one or more virtual access requests based on the request 50 and these virtual addresses.
- Virtual access requests each include:
- the size of the request that is to say the number of bits to be accessed following the targeted virtual address
- step 52 consists in determining the logical space and the virtual address (es) on this space designated by the request 50, and to produce one or more "virtual" requests.
- file access requests will target the content of a large quantity of virtual addresses, to enable the content of a file to be reconstructed, whereas a virtual request targets the contents of a data block. associated with this address.
- the resulting virtual access request (s) are then transmitted to the virtualization layer, which determines the physical address (es) and the corresponding storage spaces in a step 54.
- the virtualization layer operates using the engine 40 and the look-up table 42.
- the searched file already exists in a storage space 38, and the engine 40 calls the correspondence table 42 with the virtual address or addresses to determine by correspondence the physical address or addresses. data from the file.
- the file does not necessarily exist beforehand in a storage space 38. Nevertheless, as we have seen above, the correspondences between virtual addresses and physical addresses are frozen, and the motor 40 therefore operates in the same way as in the context of a read request to determine the physical address or addresses of the data.
- step 56 physical access requests that it transmits to the storage client 44.
- the physical access requests are generated based on the request 50 and the physical address (es) determined in step 54.
- These requests include: - the targeted physical address;
- the size of the request that is to say the number of bits to be accessed following the physical address targeted by the request
- the physical address and the size of the request are obtained directly from step 54, and the type of the request is inherited from the type of the virtual access request concerned.
- a loop is then initiated, in which a stopping condition 58 is reached when a physical access request has been issued to the storage client 44 for all physical addresses obtained in step 52.
- each physical access request is placed in a request queue of the storage client 44 for execution in a step 60.
- the storage client 44 may optionally include several queues, for example a queue of data storage requests. wait by storage server 46 with which it interacts.
- step 56 all physical access requests in step 56 are represented as successively performed for simplicity. However, the execution can also be performed in parallel, and not only in series.
- requests are transmitted from layer to layer, up to the physical access layer.
- the storage client 44 interacts with the storage server 46 of the storage station that contains the storage space 38 on which the physical address designated by the storage address 38 is located. the physical access request concerned.
- FIG. 5 represents an exemplary embodiment of the virtualization layer 36 of FIG.
- the engine 40 includes a queue 402, an address determination unit 404 and a cover unit 406.
- Queue 402 receives all virtual access requests for the determination of the corresponding physical addresses.
- the determination of the physical addresses is carried out by the address determination unit 404, in collaboration with the correspondence table 42.
- the correspondence table 42 contains only an extremely limited set of data which will be described later.
- the invention proposes several schemes for assigning virtual spaces to physical spaces. These assignments make it possible to quickly and inexpensively determine the virtual address / physical address correspondences on the basis of light algorithms while offering a high quality of service. This is much more efficient in terms of processor and memory occupation than the use of a direct look-up table such as a "look-up table" (in English).
- the function of the cover unit 406 is to update certain physical addresses when a storage space of a given station has ceased to function, as will be described below.
- FIG. 6 illustrates a first scheme for allocating virtual spaces to physical spaces so as to tolerate so-called correlated failures.
- failure correlated means a failure that renders inoperative a set of spaces or storage units (hereinafter disks) connected together (hereinafter group of failure).
- the disks can be grouped by failure group on the basis of a fault dependency criterion.
- a fault dependence criterion aims to bring together disks for which the probability of a simultaneous failure is important, so as to ensure that the data of these disks are not replicated on one of them. .
- a fault dependency criterion mention may be made of the fact that the same node-node belongs, from both a hardware and software point of view, to the link to the same network node, from a material point of view that software, the proximity of geographical location, etc.
- the consecutive virtual address data is gathered into hatches that extend over all the disks.
- the node N1 here comprises three disks MU11, MU12 and MU13, the node N2 three disks MU21, MU22 and MU23, the node N3 a disk MU31, and the node N4 three disks MU41, MU42 and MU43.
- each node forms a failure group, i.e., in terms of failure, it is assumed that the disks of a node depend on it, but that the node disks distinct are independent of each other.
- the failure group N1 thus consists of the disks MU 11 to MU13
- the failure group N2 is composed of the disks MU21 to MU23
- the failure group N3 is composed of the disk MU31
- the failure group N4 is composed of the disks MU41. at MU43.
- the groups of failures could be defined differently, for example by belonging to a network unit. It is therefore important to understand that disks are first grouped by failure group and not only by node.
- the assignment of the failure group disks to a replication group follows an allocation algorithm that implements the constraints described above.
- the replication group GR1 comprises four disks corresponding respectively to the MLM disks 1, MU21, MU31 and MU43, and the other disks MU12 and MU13 of the failure group N1 are assigned respectively to the groups GR2 and GR3.
- Other disk allocation algorithms are possible, as those skilled in the art will recognize. It will be noted, for example, that for allocating the disks to the replication groups, it would also be possible to take into account the space available on the disks, so as to ensure replication groups of uniform size. Another possibility would be to take into account their performance to obtain homogeneous performance replication groups.
- the virtual addresses are assigned on the disks sorted by replication group.
- each number that is represented means the association of virtual addresses to physical addresses.
- the virtual addresses are grouped in ascending order in units of hatching ("striping unit" in English) of defined sizes.
- the hatch units are themselves associated with physical addresses on the disks.
- the hatch units are assigned to all disks of all replication groups in ascending order, line by line.
- the virtual addresses are thus allocated in block to the replication groups, in an increasing manner.
- the first hatch unit of the first disk of the first replication group receives the index 0
- the first hatch unit of the second disk of the first replication group receives the index 1
- a line of hatch units will subsequently be called the actual hatch. Note also in the upper part of Figure 6 that for every other line, the hatch units are not ordered by increasing index.
- the physical addresses of the disk MU11 of N1 receive the following hatch units: 0, 3, 10, 12, 20, 21, 30, 33, 40 and 42.
- the data is replicated to directly consecutive true hatches within the main hatch.
- the replicated data could be in non-consecutive real hatches.
- Figure 7 shows the example of a function that allows to find a physical address from a virtual address.
- this function is implemented in the address determination unit 404. It starts in an operation 2000 of a virtual address, for example taken from the queue 402.
- a test is performed to determine if the virtual address is related to a write request. If this is the case, in an operation 2040, a test is performed to determine if the system is in a degraded state, that is, if one of the disks is down. If this is the case, a Wrt_Drt_Z () function is called in 2060.
- the operations 2040, 2060 and 2080 are connected to functions enabling the rapid recovery of a disk failure which will be described further with FIGS. 9 and 10.
- a function SU_lnd () is called in a operation 2070.
- the function SUJ nd () starts from the virtual address whose match is sought, and returns the index of the hatch unit SU Ind. associated with it. This is easily achievable as the size of each hatching unit is known.
- a Get_Phy_lnd () function is called in an operation 2080 which determines corresponding physical address indices.
- the physical address indices are converted into physical addresses by a function Phy_Ad (). This is easily achievable as the size of each hatching unit is known.
- the function Get_Phy_lnd () receives the following two arguments (operation 2082):
- the table Ngr [] is useful because it allows to quickly find the total number of N disks and the number of Ngr replication group, and to access just as quickly the number of disks by replication group.
- N and Ngr are arguments, but a computation is then necessary when one needs the number of disks within a given replication group.
- the role of the Get_Phy_lnd () function is to find the index of the hatch unit in the sort by failure group from its index in the sort by replication group.
- StripQ determines a principal hatch index k, as well as the index m1 in the replication groups of the first disk on which the virtual address is stored. This is accomplished by applying Equations 20 and 21 of Appendix A.
- a function Repl () determines the actual hatch indices k1 and k2 to account for data replication. This is accomplished by applying Equations 22 and 23 of Appendix A.
- a function Spl () determines the index p of the replication group that corresponds to the disk index m1, as well as the index q1 of the disk m1 within this replication group. This is accomplished by applying Equations 24 and 25 of Appendix A.
- a function Shft () determines an index q2 within the index replication group p of the disk on which the replicated data is stored. This is accomplished by applying Equation 26 of Appendix A.
- each disk within a replication group contains all replicated data from another disk in the same group.
- a function Mrg () determines a disk index m2 which corresponds to the index q2 within the index replication group p. This is accomplished by applying Equation 27 of Appendix A.
- the indexes m1 and m2 of disks classified by group of replication are converted into disk indices n1 and n2 of the disks classified by failure group by a function Get_Dsk_lnd ().
- This function performs the inverse operation of equation 12 and applies equations 28 and 29 of Appendix A.
- the Get_Phy_lnd () function returns the physical address indices determined in a 2099 operation.
- the principle of this reintegration is based on the marking of the virtual zones associated with a failed disk during a write.
- the first table contains two lines, one of which receives disk IDs and the other receives a growing index.
- the second table contains two lines, one of which receives zone identifiers, and the other receives a modification index.
- the fault table is stored on each of the disks in a space reserved by the administration module which is not usable for storing the data. This table is kept consistent on all the disks by the administration module. This table has a fixed size equal to the total number of disks.
- the fault table is filled by means of the function shown in FIG. 9.
- this function receives the identifiers of all the failed disks.
- a function ls_Tgd_Dsk () searches for each failed disk identifier in the fault table.
- the fault table For each missing identifier, the fault table creates a new entry that receives the disk identifier in the first line, and an index incremented in the second line (operation 2204). Otherwise the function processes the next identifier (operation 2206).
- the fault table is implemented as a stack or linked list. Therefore, it comprises only one line which receives the indices of disks, and it is the index of position of each identifier in the table which serves as increasing index.
- the Wrt_Drt_Z () function relies on clues in the fault table to maintain an up-to-date view of the areas associated with a failed disk that have been modified.
- the cover unit 406 may perform the function of Fig. 10 to restore a disk after a failure. For this, the unit 406 starts with a subscript i set to zero (operation 2300) and goes through the zone table.
- a function Upd_Z (i) updates the data of the area concerned by retrieving the corresponding replicated data (operation 2304).
- the zone table is updated to reflect this operation (2306).
- an End_Drt_Zone () function deletes the entry from the fault table associated with the restored disk, and goes through the fault table to increase the index of the zones by the maximum of the remaining indices. This ensures slow index growth and avoids processing too much data.
- the zone table can receive zones of configurable size. Indeed, an entry in the zone table is associated with a plurality of contiguous virtual addresses.
- this table is stored in the reserved data area of the logical volume to which the virtual addresses belong, that is, it is equally distributed on all the disks. Note that the reserved data area of the logical volume is not extensible indefinitely. It should also be noted that the call to the zone table constitutes a read request in the system.
- FIGS. 9 and 10 can be seen as functions that loop in parallel with the main execution of the system. Indeed, to ensure maximum information security, these functions constitute "interruptions".
- Figure 11 shows an assignment of virtual spaces to physical spaces as an alternative to that of Figure 6.
- fault tolerance is also increased, this time voluntarily leaving free spaces within replication groups called resiliency units.
- FIG. 1 For the sake of clarity, there is shown in this figure a single replication group that includes seven disks. Among these seven discs, we will define hatching with four hatching units for storing data, and three resiliency units for fault tolerance.
- the disks are divided into several groups to avoid correlated failures, that is to say affecting several disks at a time.
- the definition of the resilience units in addition to the hatch units amounts to distributing part of the physical addresses into a working group (those receiving the data from the hatch units) on the one hand, and into fault groups (those that receive data from the resiliency units) on the other hand, according to a fault tolerance criterion.
- Such fault tolerance criterion relates for example to the number of successive failures that one wishes to support, and therefore the number of fault groups to manage. In the present example, this number is three. Other criteria could nevertheless be used.
- a processing block A for determining the disk and hatch indices as before, without taking into account the presence of the resilience units
- the function Get_Phy_lnd () receives the following three arguments (operation 2482):
- the Strip () function determines a principal hatch index k, as well as the index mm1 of the first disk on which the virtual address is stored.
- Operation 2484 differs from operation 2084 of Figure 8 in that the Strip () function is called with the number of hatch units, i.e. the number of disks minus the number of units. of resilience.
- the function Repl () determines the actual hatch indices k1 and k2 to account for the replication of the data as the operation 2086.
- the Shft () function determines a mm2 index of the disk that receives the replicated data. Operation 2490 differs from operation 2090 of Figure 8 in that the function Shft () is called with the number of hatch units, i.e., the number of disks minus the number of units of resilience.
- a function Cp_Spr () determines an index m1 which corresponds to the real index of the disk associated with the index mm1. This function is used to modify the mm1 index to take into account the presence of the resilience units. As we will see below, the index m1 that returns the function Cp_Spr () can be an index of a unit of resilience.
- a function Cp_Spr () determines an index m2 which corresponds to the real index of the disk associated with the index mm2. This function is used to modify the mm2 index to take into account the presence of the resilience units. As will be seen below, the index m2 that returns the function Cp_Spr () can be an index of a unit of resilience.
- the physical address indices are returned in 2499.
- the function Cp_Spr () will now be described with FIG. 13. This function receives as arguments an index of disk mm, a index of unit of hatching k, a total number of disks N and a number of units of resilience S.
- the Cp_Spr () function starts by executing a Spr () function in 2602.
- the Spr () function implements Equation 30 of Appendix A.
- the Spr () function receives three input arguments: * the disk index mm,
- the function Spr () thus makes it possible to establish an index m which takes into account the presence of the S units of resilience.
- a test determines in an operation 2604 whether the disk associated with the real index m has failed and whether a resiliency unit has been assigned to it.
- An example of implementation of the function 2604 is the holding of a table here called resilience table.
- This table contains a single line, in which each column corresponds to a disk of index m corresponding to the number of the column.
- Such a resilience table is stored on each of the disks and is synchronized continuously, together with the fault table for example.
- the index mm is updated in an operation 2606 by a function Spr_2 () which implements equation 31 of Annex A using as arguments the total number of disks N , the number of resilience units S, and the index m that has just been calculated.
- This function assumes that the index disk data m is stored in each hatch on the resilience unit whose resilience index is indicated by the resilience table, and therefore only the index hatch k to determine the index of the disk on which the desired resilience unit is allocated.
- the Cp_Spr () function is restarted.
- the function Cp_Spr () is repeated again, so that the The returned index corresponds to a resiliency unit on a functional disk.
- this function receives the IDs of all failed disks.
- a function Spr_Dsk modifies the index of the failed disk with which no resilience unit is associated.
- the value of its resilience index receives the first resilience index not already assigned.
- the Wrt_Spr_Dsk () function generates queries for writing data available on the remaining hatch unit to the resiliency unit, and these requests are executed in competition with the other access requests. This means that the resiliency unit can not be used until this write request has been made. Finally, the function ends in 2706.
- the Wrt_Spr_Dsk () function generates the write requests on the resiliency units and executes these requests before any other access request.
- This function can therefore be performed directly in the virtualization layer or in the administration module, as a separate function as presented here, or as an integral part of the functions presented above.
- FIG. 15 The function shown in FIG. 15 is a mixture of the variants of the Get_Phy_lnd () function described with FIGS. 8 and 12. Thus, it has great similarities with them. This is why some operations have not been renumbered.
- the Get_Phy_lnd () function receives the following arguments:
- the table Ngr [] is useful because it allows to quickly find the total number of N disks and the number of Ngr replication group, and access just as quickly to the number of disks per replication group.
- N and Ngr are arguments, but a computation is then necessary when one needs the number of disks within a given replication group.
- Operations 2484 and 2486 are then performed identically to those of FIG. 12, and operation 2488 is performed as step 2088 of FIG.
- the operations 2490 to 2494 are performed as in the case of Figure 12, taking into account that this operation is performed within a replication group.
- the indices of the disks q1 and q2 within the replication group p are then transformed into disk indices m1 and m2 in operations 2496 and 2497 in a similar manner to the operation 2097 of FIG. 8.
- indexes m1 and m2 of disks classified by replication group are converted into disks indexes n1 and n2 disks classified by failure group in an operation 2498 as the operation 2098 of Figure 8.
- This embodiment is the embodiment is very advantageous because it offers a very high tolerance to various failures, both through the distribution of data on disks belonging to different failure groups that through the use of the units of resilience that can contain failures within replication groups.
- hatching units are here described as having a fixed size. It would nevertheless be possible to implement the invention by assigning different sizes of hatch units according to the replication groups with some modifications to the described functions.
- the engine and lookup table described herein form a correspondence module capable of assigning the virtual addresses to physical addresses based on a rule having an arithmetic formula defined by the above functions.
- these functions constitute the essence of the assignment rule, it would be possible to use a rule that fixedly affects a certain part of the virtual addresses, and which affects another part of the virtual addresses with functions.
- obtaining addresses that are called "physical address” means above all a disassembly of the virtualization layer. These addresses could indeed themselves be virtual addresses of a logical space of a storage server. In the same way, the virtual addresses received upstream could themselves correspond to a disabstraction of a higher virtualization layer.
- the application that accesses the stored data may include a driver that manages the relationships between the various elements such as the application-file system interaction, the file system interaction-correspondence module, the correspondence module-client interaction of storage, implementing the storage server policy by getting each item a result and calling the next item with that result (or a modified form of that result).
- a driver that manages the relationships between the various elements such as the application-file system interaction, the file system interaction-correspondence module, the correspondence module-client interaction of storage, implementing the storage server policy by getting each item a result and calling the next item with that result (or a modified form of that result).
- the system is autonomous and does not depend on the application that calls the data, and the elements are able to communicate with each other, so that the information goes down and then goes up the element layers in element.
- the communications between these elements can be provided in different ways, for example by means of the POSIX interface, IP 1 TCP, UDP protocols, shared memory, RDMA (Remote Direct Access Memory). It should be borne in mind that the object of the invention is to provide the advantages of specialized storage systems based on existing network resources.
- a specialized or general purpose processor for example of the CISC or RISC type or other type
- one or more storage disks for example Serial ATA, SCSI, or other hard disk drives
- any other type of storage and
- a network interface for example Gigabit, Ethernet, Infiniband, SCI .
- an operating system-based application environment eg Linux
- an operating system-based application environment eg Linux
- an application set for carrying out the correspondence module for example the Clustered Logical Volume Manager module of the Exanodes (registered trademark) application of the company Seanodes (registered trademark),
- This type of system can be realized in a network comprising: conventional user stations, adapted for application use on a network and acting as application nodes, and
- the invention encompasses the computer system comprising the application nodes and the nodes storing as a whole. It also encompasses the individual elements of this computer system, and in particular the application nodes and the storage nodes in their individuality, as well as the various means for carrying them out.
- the data management method is to be considered in its entirety, that is to say in the interaction of the application nodes and the storage nodes, but also in the individuality of the computer stations adapted to achieve the application nodes. and the storage nodes of this process.
- the invention also covers a method for storing data comprising determining at least a first and a second storage address from a virtual address received at the input, the storage addresses being associated with storage units, and storing data associated with said virtual address in said first and second determined storage addresses, the method being characterized in that: a first table comprising data for identification of failed storage units, and a second table comprising virtual address block modification data are maintained, and
- the storage addresses of this storage unit are updated from virtual addresses from the second table, based on the data of the first table.
- the method may further be characterized in that: * the modification of a virtual address, the second table stores an index for the block of virtual addresses to which this address belongs, said index being defined to indicate the posteriority with respect to the most recent failure indicated in the first table;
- the storage addresses of this storage unit which correspond in the correspondence module to virtual addresses which belong to blocks of virtual addresses whose index in the second table indicates a posteriority relative to the index of the storage unit restored in the first table are updated;
- the first table stores an index for this storage unit, said index being defined higher than the indices already defined in the first table;
- the second table stores an index for the virtual address block to which the address belongs, said index being defined equal to the maximum indices defined in the first table in the modification;
- the storage addresses of this storage unit which correspond in the correspondence module to virtual addresses which belong to blocks of virtual addresses whose index in the second table is greater than or equal to the index of the storage unit restored in the first table are updated;
- the first table is updated by deleting the corresponding index. to the reinstated storage unit, and the second table is updated by setting the indices of the second table that are greater than the maximum of the indices defined in the first updated table as equal to this maximum.
- the invention also covers, as products, the software elements described, made available under any “medium” (support) readable by computer.
- computer readable medium includes data storage media, magnetic, optical and / or electronic, as well as a medium or transmission vehicle, such as an analog or digital signal.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Hardware Redundancy (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR0707189A FR2922335A1 (fr) | 2007-10-12 | 2007-10-12 | Systeme informatique ameliore comprenant plusieurs noeuds en reseau. |
PCT/FR2008/001088 WO2009047397A2 (fr) | 2007-10-12 | 2008-07-23 | Système informatique amélioré comprenant plusieurs noeuds en réseau |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2212791A2 true EP2212791A2 (fr) | 2010-08-04 |
Family
ID=39048748
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP08837674A Withdrawn EP2212791A2 (fr) | 2007-10-12 | 2008-07-23 | Système informatique amélioré comprenant plusieurs noeuds en réseau |
EP08841064A Withdrawn EP2212792A1 (fr) | 2007-10-12 | 2008-08-04 | Système informatique amélioré comprenant plusieurs noeuds en réseau |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP08841064A Withdrawn EP2212792A1 (fr) | 2007-10-12 | 2008-08-04 | Système informatique amélioré comprenant plusieurs noeuds en réseau |
Country Status (3)
Country | Link |
---|---|
EP (2) | EP2212791A2 (fr) |
FR (1) | FR2922335A1 (fr) |
WO (3) | WO2009047398A2 (fr) |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4498146A (en) * | 1982-07-30 | 1985-02-05 | At&T Bell Laboratories | Management of defects in storage media |
US5636356A (en) * | 1992-09-09 | 1997-06-03 | Hitachi, Ltd. | Disk array with original data stored in one disk drive and duplexed data distributed and stored in different disk drives |
US5559764A (en) * | 1994-08-18 | 1996-09-24 | International Business Machines Corporation | HMC: A hybrid mirror-and-chained data replication method to support high data availability for disk arrays |
JP3344907B2 (ja) * | 1996-11-01 | 2002-11-18 | 富士通株式会社 | Raid装置及び論理ボリュームのアクセス制御方法 |
EP1162537B1 (fr) * | 2000-06-09 | 2007-09-26 | Hewlett-Packard Company, A Delaware Corporation | Utilisation d'espace disque inutilisé sur les ordinateurs gérés en réseau |
GB2378277B (en) * | 2001-07-31 | 2003-06-25 | Sun Microsystems Inc | Multiple address translations |
US7035922B2 (en) * | 2001-11-27 | 2006-04-25 | Microsoft Corporation | Non-invasive latency monitoring in a store-and-forward replication system |
-
2007
- 2007-10-12 FR FR0707189A patent/FR2922335A1/fr not_active Withdrawn
-
2008
- 2008-07-23 WO PCT/FR2008/001089 patent/WO2009047398A2/fr active Application Filing
- 2008-07-23 EP EP08837674A patent/EP2212791A2/fr not_active Withdrawn
- 2008-07-23 WO PCT/FR2008/001088 patent/WO2009047397A2/fr active Application Filing
- 2008-08-04 WO PCT/FR2008/001168 patent/WO2009053557A1/fr active Application Filing
- 2008-08-04 EP EP08841064A patent/EP2212792A1/fr not_active Withdrawn
Non-Patent Citations (1)
Title |
---|
See references of WO2009047397A3 * |
Also Published As
Publication number | Publication date |
---|---|
WO2009053557A9 (fr) | 2010-04-22 |
WO2009047397A3 (fr) | 2009-06-04 |
WO2009053557A1 (fr) | 2009-04-30 |
WO2009047398A3 (fr) | 2009-06-04 |
EP2212792A1 (fr) | 2010-08-04 |
WO2009047398A2 (fr) | 2009-04-16 |
WO2009047397A2 (fr) | 2009-04-16 |
FR2922335A1 (fr) | 2009-04-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8548957B2 (en) | Method and system for recovering missing information at a computing device using a distributed virtual file system | |
US7996610B2 (en) | Transparent backup service for networked computers | |
EP1815359A1 (fr) | Système et procédé de sauvegarde distribuée pérenne | |
US8849877B2 (en) | Object file system | |
US8296420B2 (en) | Method and apparatus for constructing a DHT-based global namespace | |
US20110010434A1 (en) | Storage system | |
US8307087B1 (en) | Method and system for sharing data storage over a computer network | |
WO2021043599A1 (fr) | Migration d'une chaîne de blocs de données | |
WO2006016085A1 (fr) | Procede de sauvegarde distribuee sur des postes clients dans un reseau informatique | |
FR2932289A1 (fr) | Procede et systeme de synchronisation de modules logiciels d'un systeme informatique distribue en grappe de serveurs, application au stockage de donnees. | |
WO2008053098A2 (fr) | Systeme informatique ameliore comprenant plusieurs noeuds en reseau | |
EP2212791A2 (fr) | Système informatique amélioré comprenant plusieurs noeuds en réseau | |
WO2011151569A1 (fr) | Procede de routage pseudo-dynamique dans un cluster comprenant des liens de communication statiques et programme d'ordinateur mettant en oeuvre ce procede | |
FR3025340A1 (fr) | Nuage de donnees | |
FR2907993A1 (fr) | Systeme informatique comprenant plusieurs noeuds en reseau. | |
US8032691B2 (en) | Method and system for capacity-balancing cells of a storage system | |
EP1912408A1 (fr) | Procédé de gestion d'une base de données partitionnée dans un réseau de communication | |
WO2010125257A1 (fr) | Système de stockage comprenant plusieurs noeuds en réseau avec gestion de synchronisation d'écriture | |
EP2353076A1 (fr) | Procede et systeme de stockage virtualise d'un ensemble de donnees numeriques | |
FR2938356A1 (fr) | Procede et systeme de synchronisation d'un ensemble de modules logiciels d'un systeme informatique distribue en grappe de serveurs | |
FR3100350A1 (fr) | migration d’une chaîne de blocs de données | |
FR3100351A1 (fr) | connexion à chaîne de blocs de données | |
FR3094509A1 (fr) | Système de stockage redondant de données, procédé et programme d’ordinateur correspondants. | |
WO2009136030A1 (fr) | Procede de sauvegarde/restauration de fichiers dans un reseau pair a pair | |
Kloudas | Leveraging content properties to optimize distributed storage systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20100512 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MT NL NO PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA MK RS |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: RICHARD, SAMUEL Inventor name: DUSSERE, MICHAEL |
|
17Q | First examination report despatched |
Effective date: 20120214 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20120201 |
|
DAX | Request for extension of the european patent (deleted) |