WO2016073019A1

WO2016073019A1 - Generating a unique identifier for an object in a distributed file system

Info

Publication number: WO2016073019A1
Application number: PCT/US2015/010238
Authority: WO
Inventors: Rajesh Kumar Chaurasia; Shruti Doval; Ashutosh Kumar; Narayanan A N; Padmagandha PANIGRAHY; Alban Kit LYNDEM; Anand A GANJIHAL
Original assignee: Hewlett Packard Enterprise Development Lp
Priority date: 2014-11-04
Filing date: 2015-01-06
Publication date: 2016-05-12

Abstract

Systems and methods for generating a unique identifier for an object in a distributed file system. In accordance with an example, the generating includes determining whether a location identifier for creation of the object in the distributed file system is being reused. Based on the determining, it is ascertained whether each reuse identifier from a set of reuse identifiers available for allocation for the location identifier is assigned once. Based on the ascertaining, a reuse identifier is assigned to the object based on the ascertaining. Further, the unique identifier is generated for the object based on at least the location identifier and the reuse identifier.

Description

GENERATING A UNIQUE IDENTIFIER FOR AN OBJECT IN A DISTRIBUTED FILE SYSTEM

BACKGROUND

[0001] Distributed file systems (DFSs) are widely used in setups having considerably large amounts of objects, such as files, directories, links and other file system objects, to be stored and accessed. A DFS can include one or more central servers for storing files accessible by a plurality of remotely located client devices over a network. To access a file on a server, a client device retrieves the file from the server and stores the file locally on a data repository of the client device. When the operation on the file is completed, the modified file is returned to the server. Therefore, to a user working on the client device, the file appears as a file stored locally on the client device; to the client device, the DFS provides a centralized storage so that client device does not use its own resources to store the file.

BRIEF DESCRIPTION OF FIGURES

[0002] The detailed description is provided with reference to the accompanying figures, in the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the figures to reference like features and components.

[0003] Figure 1 A illustrates a schematic of an identifier generation system for generating a unique identifier for an object created in a distributed file system, according to an example of the present subject matter.

[0004] Figure 1 B illustrates a detailed schematic of the identifier generation system, according to an example of the present subject matter.

[0005] Figure 2 illustrates a method for generating a unique identifier for an object in a distributed file system, according to an example of the present subject matter.

[0006] Figure 3 illustrates a detailed method for generating the unique identifier for the object in the distributed file system, according to an example of the present subject matter. [0007] Figure 4 illustrates a computer readable medium storing instructions to generate a unique identifier for an object in a distributed file system, according to an example of the present subject matter.

DETAILED DESCRIPTION [0008] A distributed file system (DFS) may include a plurality of nodes each having one or more storage units associated therewith for storing objects, such as data files and metadata. Further, the storage units associated with a node can be grouped together to form one or more segments managed by the node. For accessibility of the objects in the DFS, the DFS may employ a uniform identification scheme to keep track of the objects and the locations thereof in nodes and in segments. For instance, the DFS may generate and associate an identifier with each object in the DFS when the object is created in the DFS.

[0009] Generally, various techniques are employed for generating and allocating the identifier for each object on the DFS. According to one known technique, a global name space can be defined in which addresses and names of ail objects and locations starting from a root of the DFS are considered. In such a case, when an object is to be located for access, the entire directory of the DFS is traversed, making the process computationally cumbersome and inefficient. According to other known techniques, different parameters associated with the object can be employed to generate the identifier for the object. For instance, timestamp of creation, ID of segment on which the object is created, location identifier of location in the segment for which the object is created, and segment- level and data block-level identifiers, can be used as the identifier for the object. However, such parameters whether used individually or in combination may not be sufficient for uniquely identifying the objects, leading to incorrect identification of objects and corruption of data associated with the objects. Still other generally employed techniques involve allocation of an identifier for each object by a user creating the object. However, such an allocation may not be foolproof and the identifiers may be susceptible to modifications

[001 0] Systems and methods for generating a unique identifier for an object in a distributed file system (DFS) are described herein, in an example, an object can include a file or metadata associated with the file or any other instance of data stored in the file system. The present subject matter provides an approach for generating the unique identifier for the object to be applicable for uniquely identifying the object at various levels in the DFS. For instance, the unique identifier generated in accordance with the present subject matter can be used for identifying the object at a global level in the DFS, i.e., an external device or application when accessing the object from outside the DFS is able to identify the object uniquely without any errors. In another example, the unique identifier can be used for identifying the object at a segment level and at the level of the location of the object in the segment.

[001 1] In an example, the present subject matter can be based on a set of reuse identifiers associated with a location identifier in the DFS, for instance, on a segment in the DFS. For instance, the location identifier may refer to a storage location on a segment in the DFS, for example, an inode slot, that may have a previous object removed from there, or the object may have never been created in that location previously, and the reuse identifier can be a counter to determine the number of times the location identifier has already been used for creation and storing of objects in the segment by the DFS. From the set of reuse identifiers, an identifier can be allocated to each object created as having that location identifier. When the set of reuse identifiers is exhausted, i.e., when ail the reuse identifiers from the set have been allocated, the location identifier can be marked as non- usable for further creation of objects.

[0012] In an example, the location identifier available for allocation to the object can be determined and then the reuse identifier for the location identifier can be allocated to the object. Accordingly, in an example of the present subject matter, whether a location identifier for creation of the object at a location in the distributed file system is being reused or not, is determined. The location can be a node or a segment of the DFS. Based on whether the location identifier is being reused or not, a reuse identifier can be allocated to the object. In case the location identifier is being reused, it is determined as to whether all the reuse identifiers in the set of reuse identifiers associated with that location identifier have been allocated once or not, i.e., whether the set of reuse identifiers has been exhausted or not. Based on whether the set of reuse identifiers has been exhausted or not, the object is created with that location identifier and allocated the reuse identifier from among the set of reuse identifiers that have not been assigned yet. In an example, the reuse identifier to be allocated is determined based on a preceding reuse identifier allocated to a previous object with the same location identifier. The reuse identifier, still, is determined from the set of reuse identifiers associated with a particular location identifier.

[0013] However, if the set of reuse identifiers has been exhausted, i.e., when the each reuse identifier from the set of reuse identifiers is assigned once, the location identifier is marked as dead for further use. Further, another location identifier can be selected for creating the object and the same assessment as above is made for the other location identifiers.

[0014] Further, a unique identifier is generated for the object based on at least the allocated reuse identifier and the location identifier. In an example, at least the reuse identifier is appended to the location identifier to generate the unique identifier for the object within a segment in the DFS. In an example, the combination of the reuse identifier with the location identifier can be used to uniquely identify the object within a segment of the DFS. In other words, in said example, the reuse identifier used along with location identifier can be a segment location-level identifier. According to an aspect, the location identifier along with the reuse identifier can be a base level identifier on the basis of which the object can be identified across the DFS at different levels by appending other identifiers for different levels, referred to as auxiliary identifiers, such as a segment identifier. In another case, the location identifier can be a combination of a plurality of identifiers, each identifier being for different levels in the DFS. Accordingly, the unique identifier can uniquely identify the object across levels of the DFS.

[0015] The above systems and methods are further described in the figures and associated description below. If should be noted that the description and figures merely illustrate the principles of the present subject matter. Therefore, various arrangements that embody the principles of the present subject matter, although not explicitly described or shown herein, can be devised from the description and are included within its scope. [0016] Figure 1A illustrates components of an identifier generation system 100, according to an example of the present subject matter. The identifier generation system 100 may include, for example, a processor 102 and modules 104 communicatively coupled to the processor 102, The processor 102 may include microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any other devices that manipulate signals and data based on computer-readable instructions. Further, functions of the various elements shown in the figures, including any functional blocks labeled as "processor(s)", may be provided through the use of dedicated hardware as well as hardware capable of executing computer-readable instructions.

[0017] The modules 104, amongst other things, include routines, programs, objects, components, and data structures, which perform particular tasks or implement particular abstract data types. The modules 104 may also be implemented as, signal processor(s), state machine(s), logic circuitries, and/or any other device or component that manipulates signals based on operational instructions. Further, the modules 104 can be implemented by hardware, by computer-readable instructions executed by a processing unit, or by a combination thereof. The modules 104 can include a location identifier management module 106 and an identifier allocation module 108.

[0018] The identifier generation system 100 can generate a unique identifier for an object created in a distributed file system (DFS), for example, for identifying the object at various levels in the DFS. In one case, the identifier generation system 100 can generate the unique identifier for identifying the object at a global level in the DFS, i.e., when accessing the object from outside the DFS, at a segment level, or at the level of the location of the object in the segment, referred to as segment location-level. In one example, the identifier generation system 100 can generate the unique identifier for the object associated with a location in the DFS based on a set of identifiers associated with the location and available for allocation to objects created at the location. Further, when the identifier generation system 100 ascertains that all the identifiers from the set have been exhausted, i.e., have been allocated once to an object, the identifier generation system 100 does not allow that location identifier to be used for further creation of objects.

[0019] In an example, the identifier generation system 100 can first determine a location identifier available for allocation to the object and then determine a reuse identifier for allocation to the object. For instance, the location identifier may refer to a storage location on a segment in the DFS and may have a previous object removed from there, or the object may have never been created in that location previously, and the reuse identifier can be a counter to determine the number of times the location identifier has already been used for creation and storing of objects in the segment by the DFS.

[0020] According to an aspect, the location identifier management module 106 can determine whether the location identifier in the segment in DFS at with which the object is to be created is being reused or not. Based on whether the location identifier is being reused or not, the location identifier management module 106 ascertains whether the each reuse identifier from the set of reuse identifiers available for allocation for that location identifier have been assigned once to an object created with that location identifier. For instance, in case the location identifier has not been used before, i.e., is not being reused, the location identifier management module 106 may not make the above ascertaining, since in such a case, the location identifier has not been assigned at all. The location identifiers can include an identifier associated with the location of creation of the object. For example, the location identifier along and the associated reuse identifier can be used to identify the object within a segment of the DFS. In other words, in said example, the location identifier used along with the reuse identifier can be a segment location-level identifier.

[0021] Further, based on the ascertaining by the location identifier management module 106, the identifier allocation module 108 can allocate a reuse identifier from the set of the reuse identifiers to the object, in an example, when the location identifier management module 106 determines that the location identifier is not being reused, i.e., being used for creating and storing the object for the first time, the identifier allocation module 108 can allocate the reuse identifier to the object. Further, in another example, when the location identifier management module 106 determines that the location identifier is being reused, but all the reuse identifiers from the set have not been allocated and are not exhausted, the identifier allocation module 108 may, in such a case too, assign the reuse identifier to the object from the set of reuse identifiers.

[0022] in addition, the identifier allocation module 108 can generate a unique identifier for the object based on the assigned reuse identifier for the location identifier it is associated with. The identifier allocation module 108 can append a location identifier to the reuse identifier to generate a unique identifier for the object, in an example, the location identifier can be based on the location of creation of the object. Further, the identifier allocation module 108 can select the location identifier based on the level at which the object is to be uniquely identified.

[0023] The various components of the identifier generation system 10(3 are described in detail in conjunction with Figure 1 B.

[0024] As explained previously, generally, various techniques are employed for generating and allocating the identifier for each object on the DFS.

According to few known techniques, different parameters associated with the object can be employed to generate the identifier for the object. For instance, timestamp of creation, ID of segment on which the object is created, location identifier of location in the segment for which the object is created, and randomly or sequentially allocated segment-level and data block-level identifiers can be used as the identifier for the object.

[0025] However, when individually used, such identifiers may not be sufficient for uniquely identifying the objects, leading to incorrect identification of objects and corruption of data associated with the objects. In an example, when the timestamp is used as an identifier, a reset of the system clock may cause same identifiers to be allocated to two or more objects. In addition, in case of geographically distributed file systems with servers falling in different time zones, two or more objects may have the time stamps and, therefore the same identifiers.

[0026] Further in case the above mentioned parameters are used in combination, the identifier so generated may also not be adequate for uniquely identifying the object. For example, in case the randomly allocated identifiers are used in combination with the location ID for identification, two or more objects may be allocated similar identifiers. This may happen in case when the location ID in the same segment is reused for creating objects, for example, in case of migrating the object having that location ID to another location, for instance, another segment. When the location is reused, identifiers associated with the location and allocated once when the object is created, can be reset. In such a case, two objects created with the same location ID, one created after the location is reused when a counter of the random identifiers is reset, may end up having the same location ID and the same random identifier allocated to them.

[0027] Still other generally employed techniques involve allocation of an identifier for each object by a user creating the object. For example, the user may associate a media access control (MAC) address or a universally unique identifier (UU!D) with the object being created. However, such an allocation may not be foolproof and the identifiers may be susceptible to modifications. As mentioned previously, in case the identifiers are unable to uniquely identify the objects at a global level in the DFS, there may be conflict during identification and access of objects leading to errors.

[0028] The present subject matter provides for a robust and effective technique for unique identifier generation for objects created in the DFS involving considerably low computational resources and time for generation. The unique identifier can be employed for distinguishing various objects in the DFS at various levels, such as within a segment, across segments, and DFS level. In addition, the unique identifier generated, in accordance with the present subject matter, can provide for uniquely identifying the object for a substantial period of operation of the DFS without errors. According to an aspect, the substantial period can include the period for as long as all the valid location identifiers for the location, for instance, the node or the segment, are not reused beyond an allowable number of times, i.e., as long as the set of reuse identifiers is not exhausted. Subsequently, the identifier generation system 100 can indicate the DFS to stop creation of new objects in the location when all the location identifiers for the location are either occupied or marked as dead for further reuse. [0029] Figure 1 B illustrates a schematic of the identifier generation system 100 showing various components thereof, according to an example of the present subject matter. As shown in Figure 1 B, in an example, the identifier generation system 100 can be a part of the distributed file system. The identifier generation system 100, among other things, may include the processor 102, modules 104, a memory 1 10, data 1 12, and interface(s) 1 14. The processor 102, among other capabilities, may fetch and execute computer-readable instructions stored in the memory 1 10. The memory 1 10, communicatively coupled to the processor 102, can include a non-transitory computer-readable medium including, for example, volatile memory, such as Static Random Access Memory (SRAM) and Dynamic Random Access Memory (DRAM), and/or non-volatile memory, such as Read Only Memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.

[0030] The interfaces 1 14 may include a variety of commercially available interfaces, for example, interfaces for peripheral device(s), such as data input output devices, referred to as I/O devices, storage devices, network devices, and intermediate power devices. The interfaces 1 14 may facilitate multiple communications within a wide variety of networks and protocol types, including wired networks and wireless networks.

[0031] As mentioned earlier, the identifier generation system 10(3 may include the modules 104. In an example, in addition to the location identifier management module 106 and the identifier allocation module 108, the modules 104 can include other module(s) 1 16. The other module(s) 1 16 may include computer-readable instructions that supplement applications or functions performed by the identifier generation system 100.

[0032] Further, the data 1 12 can include location identifier data 1 18, allocation data 120, and other data 122. The other data 122 may include data generated and saved by the modules 104 for providing various functionalities to the identifier generation system 100.

[0033] In addition, although not depicted in Figure 1 B, the identifier generation system 100 can be coupled to a plurality of storage units of the DFS. For example, the identifier generation system 100 can be coupled to one or more segments of the DFS through one or more nodes of the DFS.

[0034] As explained above, the identifier generation system 100 can achieve generation of the unique identifier for the object being created and stored in the DFS, and the unique identifier so generated can be used to identify the object at various levels in the DFS. According to an aspect, the identifier generation system 100 can prevent re-allocation of previously assigned reuse identifiers associated with the location identifier with which the object is to be created and stored. Accordingly, the identifier generation system 100 can provide for the allocation of the unique identifiers to the objects in a manner that the objects can be consistently identified even over a long periods of operation of the DFS.

[0035] As part of operation of the identifier generation system 100, the location identifier management module 106 can assess the location identifier with which the object is to be created and stored for suitability for creating the object and generating an appropriate unique identifier for the object. For instance, the location identifier management module 106 can assess whether location identifier is such that the object created having the location identifier will acquire a unique identifier capable of uniquely identifying the object in the DFS. In an example, the location identifier management module 1 (36 can select the location identifier from a list of available location identifiers. For instance, the free list can include the location identifiers which are available after an object has been deleted therefrom and those location identifiers which are fresh and have not been used previously.

[0036] Accordingly, in an example, the location identifier management module 106 can determine whether the location identifier in the segment in DFS with which the object is to be created is being used for creating and storing the object for the first time or whether the location identifier was previously used for creation of an object. In an example, the information regarding the location identifier as to whether the location is used previously or not can be stored in the location identifier data 1 18. For instance, the location identifier management module 106 can ascertain whether any of the reuse identifier from the set associated with that location identifier has been allocated previously or not, to determine whether the location identifier is used previously or not. Consider a scenario where the location identifier can be reused 10 times and after which the location identifier management module 106 can stop using that location identifier. The location identifier management module 106 can maintain a counter that can go from 0 to 10 and increments the counter when the object is created as having that location identifier. If the value is 0, it means the location identifier has never been used previously and any value between 1 through 9 means the location identifier can be reused. Once the counter reaches 10, the location identifier management module 106 does not use the location identifier any further.

[0037] in an example, the reuse identifiers may be allocated sequentially until the set of identifiers is exhausted, for a given location identifier. In case the location identifier has not been used before, then the location identifier management module 106 can allow creation of the object with that location identifier. Accordingly, the identifier allocation module 108, in such a case, can assign the reuse identifier from the set of reuse identifiers, unused in the present case, to the object being created by the DFS in a particular segment. In an example, the identifier allocation module 1 (38 can allocate the reuse identifier from the set in the present case, based on random allocation, sequential allocation, deterministic allocation, or based on other techniques of allocation. The identifier allocation module 108 can store the allocated reuse identifier along with location identifier to which the reuse identifier is associated with, in the allocation data 120.

[0038] For instance, the set of reuse identifiers can include a range of numbers available for allocation for objects being created by the DFS in a segment with a location identifier, the range being based on a memory bit-size of the reuse identifier in the entire unique identifier. For example, when the unique identifier has a bit-size of about 128-bits, the reuse identifier can have a bit-size of about 32-bits and the range of available reuse identifiers in the reuse set can be about 2³².

[0039] Further, in case the location identifier management module 106 determines that the location identifier has already been used previously, the location identifier management module 106 further determines whether the set of reuse identifiers associated with the location identifier and available for allocation have all been already assigned to objects created as having the location identifier previously or not. in an example, the location identifier management module 108 can access the allocation data 120 associated with the location identifier to determine the counter of the set of identifiers used already.

[0040] When the location identifier management module 106 determines that all the reuse identifiers from the set have not been used and few are still left unallocated, the identifier allocation module 108 can allow the creation of the object with that location identifier and allocate the reuse identifier from among the unallocated reuse identifiers, in an example, the identifier allocation module 108 can determine the reuse identifier from among the unallocated reuse identifiers in the set based on a preceding reuse identifier allocated to a previous object having the location identifier. For instance, in case the reuse identifier is a numeric identifier, the identifier allocation module 108 can sequentially increment the numeric identifier by a predetermined amount, for example, by 1 . in an example, the reuse identifier can be a counter to determine the number of times the location identifier has already been used for creation and storing of objects in the segment by the DFS. in another case, the identifier allocation module 108 can achieve random allocation of the reuse identifier to the object for that location identifier, in yet another example, the identifier allocation module 108 can achieve a deterministic allocation of the reuse identifier to the object from among the set of reuse identifiers. According to one example, the identifier allocation module 108 may select the reuse identifier based on a heuristic and an accompanying logic, to ensure that that the reuse identifier is not used more than once for allocation to objects.

[0041] Further, the identifier allocation module 108 can generate a unique identifier for the object based on the allocated location identifier along with its associated reuse identifier, in an example, the identifier allocation module 108 can append the location identifier with at least a reuse identifier to generate the unique identifier for the object. The location identifier can be based on the location of creation, for example, indicative of the location of creation, of the object. In one instance, the location identifier can be indicative of the inode in the segment in which the object is created and, in such a case, the location identifier can be an inode number (inum).

[0042] In another instance, the identifier allocation module 108 can append an auxiliary identifier to the unique identifier to create a DFS level unique identifier to uniquely identify the object across the distributed file system. For example, the identifier allocation module 108 can prefix the segment identifier for the segment on which the object is created to the unique identifier, i.e., the location identifier and the reuse identifier, to the inode number (inum) allocated for the object. Thus the DFS level unique identifier can include the segment identifier prefixed to the location identifier, such as the inode number (inum), in turn, prefixed to the reuse identifier, such as the inode generation number.

[0043] in another example, the location identifier can be a combination of a plurality of identifiers, each identifier being for different Ieveis in the DFS. For example, the location identifier can be a combination of the identifier of the segment and identifier of the inode in the segment in which the object is created. Accordingly, the unique identifier can uniquely identify the object across Ieveis of the DFS. According to an aspect, the reuse identifier can be a base level identifier on the basis of which the object can be identified across the DFS at different levels, when the identifier allocation module 108 appends other identifiers, such as the location identifier, to the reuse identifier.

[0044] in an example, the identifier allocation module 108 can assign the reuse identifiers on the above mentioned scheme of incrementing the previously used reuse identifier by 1 when allocating the new reuse identifier, and therefore, the change in the identifier is in the direction of increasing numeric value of the reuse identifier. In such a case, the present subject matter can provide for another mode of ensuring that the allocation of the identifiers is done in a fail-safe manner. For instance, in such a case, the location identifier management module 106 may re-check whether the reuse identifier obtained after incrementing the previously used identifier is within the bounds of maximum allowed values for the reuse identifier. In case the identifier rolls over the maximum allowed value, it means that the set of reuse identifiers for the location identifier have been exhausted, in another example, in case the reuse identifiers are allocated sequentially, when the value of the reuse identifiers reaches a maximum value available, the reuse identifiers for the location identifier have been exhausted.

[0045] According to an aspect of the present subject matter, when the location identifier management module 106 determines that each reuse identifier in the set of reuse identifiers has been already allocated once for the location identifier under consideration, then the location identifier management module 106 marks that location identifier as unusable for further creation and storage of objects. Accordingly, the location identifier management module 106 can mark the location identifier as dead for further use. Further, the location identifier management module 1 (36 can store the information regarding the location identifier having been marked as dead in the location identifier data 1 18.

[0046] Subsequently, the location identifier management module 106 can prompt the DFS to select another location identifier for the creation of the object. The identifier generation system 100, i.e., the location identifier management module 106 and the identifier allocation module 108 can achieve the above mentioned assessment for the other location identifier to determine whether the other location identifier is suitable for the creation and storage of the object and ensuring that the unique identifier generated for the object is capable of uniquely identifying the object at various levels in the DFS.

[0047] in addition, the identifier generation system 100 can provide for crash recovery in an effective manner. For example, in the event of crashing of a node or a segment of the DFS, the identifier generation system 100 can provide for the allocation of the same reuse identifiers and unique identifiers to the objects in the node or the segment as those before the crash.

[0048] The above described scheme of allocating the identifiers and then generating the identifiers by the identifier generation system 100 is explained further with reference to the following example.

[0049] Consider that an object from segment 1 with a location identifier 100 and the reuse identifier 200, with location identifier 100 having the associated set of reuse identifiers ranging from 0 to 5000. in such a case, the unique identifier of the object can be 1 : 100:200, where 1 :100 depict the location identifier of the object. When the object is migrated to another segment 2, then the location identifier 100 in segment 1 is available for creating and storing the object. The migrated object retains the original unique identifier once allocated.

[0050] When another object is to be created with location identifier 100, then the location identifier management module 106 determines whether any reuse identifier from the set is unused. In the present case, since the location has a few reuse identifiers available for allocation, the identifier allocation module 108 allocates the reuse identifier 201 to the subsequent object created having the location identifier 100. For instance, the unique identifier for the subsequent object can be 1 :100:201 . As the objects are migrated or deleted, and recreated having the location identifier 100 on segment 1 , the reuse identifier keeps varying from 202, 203, 204, and would eventually become 5000.

[0051] Subsequently, when an object is to be created with location identifier 100, the location identifier management module 106 determines that the reuse identifiers have all been exhausted for the location identifier 100, for instance, based on the previously allocated reuse identifier 5000. Accordingly, the location identifier management module 106 marks the location identifier 100 as dead and is not used further for creation and storage of objects. On the contrary, if the location identifier management module 106 does not mark location identifier 100 as dead, then for subsequent objects created with location identifier 10(3, the reuse identifier will roil over and restart from 0 and would eventually reach 200 and then proceed to be 5000. Therefore, in such a case, there might be a scenario where two objects in the DFS across segments have the same unique identifiers, for instance, when the previous object was migrated to another segment, for example when object with unique identifier 1 :100:200 migrated from segment 1 to segment 2 previously. However, the present subject matter prevents such eventuality from occurring and maintains uniqueness of the objects at various levels in the system.

[0052] in addition, the location identifier management module 106 can provide for ceasing creation of new objects in the DFS, for instance, on all the nodes and segments, when all valid location identifiers for the DFS have been marked dead, in one case, the location identifier management module 106 can determine whether all location identifiers for the DFS, for instance, within a segment or across segments, are either occupied or marked dead, and accordingly, cease creation of the object in the DFS.

[0053] Methods 200 and 300 are described in Figure 2 and Figure 3, respectively, for generating a unique identifier for an object created in a distributed file system (DFS), according to an example of the present subject matter. While Figure 2 illustrates an overview of the method 200 for generating the unique identifier for the object created in the DFS, Figure 3 illustrates a detailed method 300 for generating the unique identifier for the object created in the DFS,

[0054] The order in which the methods 200 and 3(30 are described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any appropriate order to implement the methods 200 and 300 or an alternative method. Additionally, individual blocks may be deleted from the methods 200 and 300 without departing from the spirit and scope of the subject matter described herein.

[0055] The methods 200 and 300 can be performed by programmed computing devices, for example, based on instructions retrieved from non- transitory computer readable media. The computer readable media can include machine-executable or computer-executable instructions to perform all or portions of the described method. The computer readable media may be, for example, digital memories, magnetic storage media, such as a magnetic disks and magnetic tapes, hard drives, or optically readable data storage media.

[0056] The methods 200 and 300 may be performed by a computing system, such as the identifier generation system 100. For the sake of brevity of description of Figure 2 and Figure 3, the components of the identifier generation system 100 performing the various steps of the methods 200 and 300 are not described in detail with reference to Figure 2 and Figure 3. Such details are provided in the description provided with reference to Figure 1 A and Figure 1 B.

[0057] Referring to Figure 2, at block 202, it is determined as to whether a location identifier for creation of the object in the distributed file system is being reused or not. For instance, the location identifier may refer to a storage location in a segment in the DFS and may have a previously object deleted or migrated with that location identifier, or the object may have never been created with that location identifier previously.

[0058] At block 204, based on the determining, i.e., whether the location identifier is being reused or not, it is ascertained as to whether each identifier, referred to as reuse identifier, from the set of reuse identifiers available for allocation for that location identifier have been assigned once to objects created with that location identifier, in one example, the reuse identifier can be an identifier associated with the location of creation of the object but not indicative of the location. For instance, in case the location identifier has not been used before, i.e., is not being reused, the ascertaining at block 204 may not be achieved, since in such a case, the reuse identifiers from the set have not been assigned at all.

[0059] At block 206, based on whether the set of identifiers has been exhausted or not, the object can be assigned the first identifier, for example, after the object is created at the location. For instance, in case each identifier from the set of reuse identifiers has not been assigned once, the object is allocated the reuse identifier from among the set of reuse identifiers. In such a case, the reuse identifier to be allocated is determined based on a preceding reuse identifier allocated to a previous object having the location identifier. The reuse identifier, still, can be determined from the set of reuse identifiers.

[0080] At block 208, a unique identifier is generated for the object at least based on the allocated reuse identifier along with the location identifier it is associated with, in an example, the location identifier is appended at least with a reuse identifier to generate the unique identifier for the object. In an example, the location identifier can be a combination of a plurality of identifiers, each of the plurality of identifiers being indicative of different levels of the DFS to indicate the location of creation of the object.

[0081] Referring to Figure 3, at block 302 creation of an object in the DFS is initiated.

[0062] At block 304, it is determined whether all the location identifiers for the DFS, such as a node or a segment or across the nodes or segments, in which the object is to be created are either occupied or unusable. The location identifier may be marked as dead or unusable. [0063] If at block 304, it is determined that all the location identifiers for the DFS are either occupied or unusable (YES path from block 304), then at block 308, the creation of object in the DFS is ceased. The DFS, in such a case, is indicated to cease creation of new objects,

[0064] If at block 304, it is determined that the location identifiers for the DFS are not all marked as occupied or unusable (NO path from block 304), then at block 308, a location identifier can be determined for the object, for instance, based on the location, in an example, the location identifier can be based on the location of creation, for example, indicative of the location of creation, of the object, in one instance, the location identifier can be indicative of an inode slot in the segment in which the object is created and, in such a case, the location identifier can be an inode number (inum).

[0085] At block 310, it is determined as to whether the location identifier for creation of the object in the distributed file system is being reused or not.

[0068] in case the location identifier is not being reused, i.e., the location identifier has not been used earlier for creating and storing an object (NO path from block 310), then at block 312, a reuse identifier is allocated from a set of reuse identifiers associated with that location identifier for allocation to the objects being created with that location identifier. In an example, the reuse identifier can be allocated based on random allocation, sequential allocation, or based on heuristics. Subsequently, as will be explained later, a unique identifier is generated for the object based on the allocated reuse identifier.

[0067] In case the location identifier is being reused, i.e., objects have been earlier created having the location identifier and then deleted or moved (YES path from block 310), then at block 314, it is further ascertained as to whether each reuse identifier from the set of reuse identifiers available for allocation for that location identifier have been assigned once to objects created with that location identifier, in other words, at block 314, it is determined as to whether the entire set of reuse identifiers has been exhausted for that location identifier or not.

[0068] if all the reuse identifiers in the set of reuse identifiers have been allocated (YES path from block 314), then at block 316, the location identifier is marked as dead for further use and the location identifier is not used for creating or storing objects further.

[0069] Further, at block 318, the creation of the object is initiated for the object as having another location identifier. Further, the assessment from block 304 onwards is achieved for the other location identifier.

[0070] However, if all the reuse identifiers from the set have not been allocated (NO path from block 314), then the creation of the object with that location identifier can be allowed. Accordingly, at block 320, the reuse identifier from among the unallocated reuse identifiers in the set is allocated to the object being created. In an example, the reuse identifier is determined from among the unallocated reuse identifiers in the set, based on a preceding reuse identifier allocated to a previous object having the location identifier.

[0071] At block 322, the unique identifier can be generated for the object by appending at least the reuse identifier with the location identifier. In another instance, a segment identifier can be appended to the location identifier for the segment on which the object is created. Thus, in the latter case, the DFS level unique identifier can include the segment identifier prefixed to the location identifier, such as the inode number (inum). in turn, prefixed to the reuse identifier, such as the inode generation number.

[0072] Figure 4 illustrates an example network environment 40(3 implementing a non-transitory computer readable medium 402 for generating a unique identifier for an object created in a distributed file system, according to an example of the present subject matter. The network environment 400 may be a public networking environment or a private networking environment, in one implementation, the network environment 400 includes a processing resource 404 communicatively coupled to the non-transitory computer readable medium 402 through a communication link 406.

[0073] For example, the processing resource 404 can be a processor of a computing system, such as the identifier generation system 100. The non- transitory computer readable medium 402 can be, for example, an infernal memory device or an external memory device, in one implementation, the communication link 406 may be a direct communication link, such as one formed through a memory read/write interface. In another implementation, the communication link 406 may be an indirect communication link, such as one formed through a network interface. In such a case, the processing resource 404 can access the non-transitory computer readable medium 402 through a network 408. The network 408 may be a single network or a combination of multiple networks and may use a variety of communication protocols.

[0074] The processing resource 404 and the non-transitory computer readable medium 402 may also be communicatively coupled to data sources 410 over the network 408. The data sources 410 can include, for example, databases and computing devices. The data sources 410 may be used by the database administrators and other users to communicate with the processing resource 404.

[0075] in one implementation, the non-transitory computer readable medium 402 includes a set of computer readable instructions, such as the location identifier management module 106 and the identifier allocation module 108. The set of computer readable instructions, referred to as instructions hereinafter, can be accessed by the processing resource 404 through the communication link 406 and subsequently executed to perform acts for network service insertion.

[0076] For discussion purposes, the execution of the instructions by the processing resource 404 has been described with reference to various components introduced earlier with reference to description of Figure 1A and Figure 1 B.

[0077] On execution by the processing resource 404, the location identifier management module 106 can determine whether the location identifier in a segment in the DFS with which the object is to be created is being used for creating and storing the object for the first time or whether the location identifier was previously used for creation of an object. Based on the determining, the location identifier management module 106 further ascertains whether the set of reuse identifiers associated with location identifier and available for allocation have ail been already assigned to objects created having the location identifier previously or not. For instance, in case the location identifier has already been used previously, the location identifier management module 106 makes the above ascertaining. [0078] According to an aspect of the present subject matter, when the location identifier management module 106 determines that each reuse identifier in the set of reuse identifiers has been already allocated once for the location identifier under consideration, then the location identifier management module 106 marks that location identifier as unusable for further creation and storage of objects. Accordingly, the location identifier management module 106 can mark the location identifier as dead for further use.

[0079] Subsequently, the location identifier management module 106 can prompt the DFS to select another location identifier for the creation of the object. Further, the identifier generation system 10(3, i.e., the location identifier management module 106 and the identifier allocation module 108, can achieve the above mentioned assessment for the other location identifier to determine whether the other location identifier is suitable ensuring that the unique identifier generated for the object being created is capable of uniquely identifying the object at various levels in the DFS.

[0080] Accordingly, the location identifier management module 106 determines whether the other location identifier being used for creating and storing the object is being reused or not. On the basis of whether the other location identifier is being reused or not, the location identifier management module 106 further determines whether the set of reuse identifiers associated with the other location identifier have all been already assigned to objects created having the location identifier previously or not. For instance, the location identifier management module 106 can make the assessment regarding the set of reuse identifiers of the other location identifier, in case the other location identifier has already been used previously.

[0081] Further, when the location identifier management module 106 determines that all the reuse identifiers from the set have not been used and few are still left unallocated, the identifier allocation module 108 can allow the creation of the object with the other location identifier and allocate the reuse identifier from among the unallocated reuse identifiers. Further, the identifier allocation module 108 can generate a unique identifier for the object based on the allocated reuse identifier along with the location identifier it is associated with. In an example, the identifier allocation module 108 can append the location identifier with at least a reuse identifier to generate the unique identifier for the object.

[0082] Although implementations for generating the unique identifier for the object created in the distributed file system have been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as example implementations for generating the unique identifier for the object created in the distributed file system.

Claims

claim:

A method for generating a unique identifier for an object in a distributed file system, the method comprising:

determining whether a location identifier for creation of the object in the distributed file system is being reused;

ascertaining, based on the determining, whether each reuse identifier from a set of reuse identifiers available for allocation for the location identifier is assigned once;

assigning a reuse identifier to the object based on the ascertaining; and generating the unique identifier for the object based on at least the location identifier and the reuse identifier.

The method as claimed in claim 1 , wherein the assigning comprises allocating the reuse identifier, from among the set of reuse identifiers available for allocation for the location identifier, when the each reuse identifier from the set of reuse identifiers is not assigned once.

The method as claimed in claim 2, wherein the allocating comprises determining the reuse identifier for allocation from the set of reuse identifiers based on a preceding reuse identifier allocated to a previous object having the location identifier.

The method as claimed in claim 1 , wherein the assigning comprises:

marking the location identifier as dead for further use when the each reuse identifier from the set of reuse identifiers is assigned once;

initiating creation of the object using another location identifier;

assessing whether the other location identifier is being reused; and determining, based on the assessing, whether each reuse identifier from the set of reuse identifiers available for allocation for the other location identifier is assigned once. The method as claimed in claim 1 , wherein the assigning comprises allocating the reuse identifier from among the set of reuse identifiers, when the location identifier is not being reused, wherein the reuse identifier is allocated based on one of random allocation, deterministic allocation, and sequential allocation.

The method as claimed in claim 1 , further comprising:

determining whether ail location identifiers for the distributed file system are at least one of occupied and marked dead; and

ceasing creation of the object, based on the determining.

An identifier generation system for generating a unique identifier for an object created in a distributed file system, the identifier generation system comprising:

a processor;

a location identifier management module coupled to the processor to, determine whether a location identifier for creation of the object at a location in the distributed file system is to be reused; and

ascertain whether each reuse identifier from a set of reuse identifiers available for allocation for the location identifier is assigned once, based on the determining; and

an identifier allocation module to,

allocate a reuse identifier to the object based on the ascertaining; and

append at least a location identifier to the reuse identifier to generate the unique identifier for the object, wherein the location identifier is based on the location of creation of the object.

The identifier generation system as claimed in claim 7, wherein the identifier allocation module is to allocate the reuse identifier from among the set of reuse identifiers available for allocation for the location identifier, when the location identifier management module is to determine non-assignment of at least one reuse identifier from the set of reuse identifiers.

9. The identifier generation system as claimed in claim 7, wherein the identifier allocation module is to append an auxiliary identifier to the unique identifier to create a DFS level unique identifier to uniquely identify the object across levels of the distributed file system.

10. The identifier generation system as claimed in claim 7, wherein the location identifier management module is to mark the location identifier as dead for further use when the each reuse identifier from the set of reuse identifiers is assigned once.

1 1 . The identifier generation system as claimed in claim 10, wherein the location identifier management module is to,

ascertain whether another location identifier for creation of the object is to be reused; and

assess, based on the ascertaining, whether each reuse identifier from the set of reuse identifiers available for allocation for the other location identifier is assigned once.

12. A non-transitory computer-readable medium comprising instructions executable by a processing resource to:

ascertain whether a location identifier for creation of an object at a location in a distributed file system is to be reused;

determine, in response to the ascertaining, whether each reuse identifier from a set of reuse identifiers available for allocation for the location identifiers is assigned once;

mark the location identifier as dead for further use when the each reuse identifier from the set of reuse identifiers is assigned once;

assess another location identifier for creating the object, wherein the assessment is to ascertain whether the other location identifier is to be reused and to determine whether each reuse identifier from the set of reuse identifiers available for allocation for the other location identifier is assigned once; assign a reuse identifier for the other location identifier to the object based on the assessment; and

generate the unique identifier for the object based on the other location identifier and the reuse identifier for the other location identifier.

The non-transitory computer-readable medium as claimed in claim 12 further comprising instructions executable by the processor to allocate the reuse identifier from among the set of reuse identifiers available for allocation for the other location identifier, when at least one reuse identifier from the set of first object is not allocated before.

The non-transitory computer-readable medium as claimed in claim 13 further comprising instructions executable by the processor to ascertain the reuse identifier for allocation from the set of reuse identifiers for the other location identifier based on a preceding reuse identifier allocated to a previous object associated with the other location identifier.

The non-transitory computer-readable medium as claimed in claim 12 further comprising instructions executable by the processor to append at least the reuse identifier for the other location to the other location identifier to generate the unique identifier for the object.