CN106777062B - Method and device for managing metadata - Google Patents

Method and device for managing metadata Download PDF

Info

Publication number
CN106777062B
CN106777062B CN201611139129.6A CN201611139129A CN106777062B CN 106777062 B CN106777062 B CN 106777062B CN 201611139129 A CN201611139129 A CN 201611139129A CN 106777062 B CN106777062 B CN 106777062B
Authority
CN
China
Prior art keywords
virtual
virtual directory
metadata
directory
groups
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611139129.6A
Other languages
Chinese (zh)
Other versions
CN106777062A (en
Inventor
李雪生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201611139129.6A priority Critical patent/CN106777062B/en
Publication of CN106777062A publication Critical patent/CN106777062A/en
Application granted granted Critical
Publication of CN106777062B publication Critical patent/CN106777062B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/188Virtual file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/119Details of migration of file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/164File meta data generation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for managing metadata, wherein the method comprises the following steps: determining a mapping range corresponding to each group aiming at least two groups corresponding to the directory; respectively storing the metadata corresponding to each file into the groups corresponding to the mapping ranges matched with the metadata; when the storage capacity of the directories is monitored not to be within the preset threshold range, a first number of virtual directories are established, and corresponding migration processing is carried out on all the currently existing groups according to a group migration rule. The massive metadata are respectively stored on the basis of the grouping, so when the storage capacity of the directory exceeds the limit, the metadata can be stored in a grouping mode through the virtual directory by adding the virtual directory and transferring the grouping, and the problem that the storage capacity of the directory exceeds the limit can be solved. Therefore, the scheme can improve the metadata retrieval efficiency.

Description

Method and device for managing metadata
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for managing metadata.
Background
With the development of the big data era, unstructured data is increased explosively, and the data generation amount is increased explosively. For example, a large scale sensor may produce a large number of data segments, thereby forming an enormous number of small data files. The generated mass data can be stored in a file data storage mode, so that data sharing and management are facilitated.
At present, for metadata corresponding to a file, a large amount of metadata can be stored in a single directory of a file system.
However, as metadata is continuously stored, the amount of metadata in a single directory tends to be too large. Therefore, when the target file needs to be searched for corresponding metadata retrieval, the existing metadata management mode can reduce the metadata retrieval efficiency.
Disclosure of Invention
The invention provides a method and a device for managing metadata, which can improve the retrieval efficiency of the metadata.
In order to achieve the purpose, the invention is realized by the following technical scheme:
in one aspect, the present invention provides a method for managing metadata, including:
s1: aiming at least two groups corresponding to the catalog, determining a mapping range corresponding to each group;
s2: executing the following steps aiming at the metadata corresponding to each file: storing the metadata into a group corresponding to the mapping range matched with the metadata;
s3: when the storage capacity of the directory is monitored not to be within a preset first threshold range, executing S4;
s4: establishing a first number of virtual directories, and performing corresponding migration processing on all the currently existing groups according to a predetermined group migration rule.
Further, the metadata corresponding to each file includes the name of the file;
the S2, including: executing the following steps aiming at the metadata corresponding to each file: calculating a hash value corresponding to the name according to the name of the file included in the hash value; determining a target mapping range matched with the hash value, wherein the hash value is located in the target mapping range; and storing the metadata into the grouping corresponding to the target mapping range.
Further, the storage capacity of the directory conforms to formula one, wherein,
the first formula comprises:
Figure BDA0001177446820000021
wherein X is the storage capacity of the directory, N is the number of the at least two packets, NiThe number of metadata stored in the ith packet of the at least two packets.
Further, the first number is the minimum number of the storage capacities of any currently existing virtual directory within the first threshold range through the migration processing.
Further, after S4, the method further includes: when it is detected that the storage capacity of any virtual directory does not lie within the first threshold range, S4 is executed.
Further, the packet migration rule includes: when all groups in all groups currently exist are arranged in sequence and all virtual directories in all virtual directories currently exist are arranged in sequence, the number of the groups corresponding to each virtual directory is determined according to a formula II;
when the number of all the currently existing virtual directories is the first number, the at least two groups are respectively migrated to each currently existing virtual directory according to the determined number of the groups corresponding to each virtual directory, wherein for any two virtual directories, when the arrangement sequence of the first virtual directory is behind the arrangement sequence of the second virtual directory, the arrangement sequence of each group in the first virtual directory is behind the arrangement sequence of each group in the second virtual directory;
when the number of all the currently existing virtual directories is not the first number, sequentially executing for each virtual directory based on the arrangement sequence of the virtual directories: calculating the difference value of the number of the groups corresponding to the virtual directory minus the determined number of the groups corresponding to the virtual directory; when the difference value is judged to be a positive number, determining the to-be-migrated packets corresponding to the virtual directory, wherein the arrangement sequence of the to-be-migrated packets is positioned behind other packets in all the packets corresponding to the virtual directory, and the number of the to-be-migrated packets is equal to the difference value; migrating the to-be-migrated packets to a next virtual directory sequentially arranged behind the virtual directory;
the second formula includes:
Figure BDA0001177446820000031
wherein X is the number of all groups currently existing; n is the number of all the virtual directories which currently exist; x is the number of1The number of the groups corresponding to any virtual directory except the last virtual directory in all the currently existing virtual directories, x2And the number of the groups corresponding to the last bit virtual directory.
Further, after S4, the method further includes: when the packet unbalance degree of the last virtual directory and the virtual directories adjacent to the last virtual directory is monitored not to be within a preset second threshold range, establishing at least one packet corresponding to the last virtual directory so that the packet unbalance degree is within the second threshold range; for all the groups which exist currently, the mapping range corresponding to each group is determined again; executing the following steps aiming at the metadata corresponding to each file: and judging whether the grouping where the metadata is currently located is the same as the grouping corresponding to the mapping range matched with the metadata, and if not, migrating the metadata to the grouping corresponding to the mapping range matched with the metadata.
Further, the number of the at least one established grouping is the minimum number which is satisfied so that the grouping imbalance degree is within the second threshold range.
Further, the grouping imbalance degree conforms to formula three, wherein,
the third formula includes:
Figure BDA0001177446820000032
wherein Y is the packet imbalance, XnFor the number of packets, X, corresponding to the last-bit virtual directorymThe number of packets corresponding to the virtual directory adjacent to the last bit virtual directory.
In another aspect, the present invention provides an apparatus for managing metadata, including:
the determining unit is used for determining a mapping range corresponding to each group aiming at least two groups corresponding to the catalog;
a mapping unit, configured to perform, for each file, corresponding metadata: storing the metadata into a group corresponding to the mapping range matched with the metadata, and triggering a first monitoring unit;
the first monitoring unit is used for triggering the processing unit when the storage capacity of the catalog is monitored not to be within a preset first threshold range;
the processing unit is used for establishing a first number of virtual directories and performing corresponding migration processing on all the currently existing groups according to a predetermined group migration rule.
Further, the metadata corresponding to each file includes the name of the file;
the mapping unit is specifically configured to execute, for each piece of metadata corresponding to each file: calculating a hash value corresponding to the name according to the name of the file included in the hash value; determining a target mapping range matched with the hash value, wherein the hash value is located in the target mapping range; and storing the metadata into the grouping corresponding to the target mapping range.
Further, the storage capacity of the directory conforms to formula one, wherein,
the first formula comprises:
Figure BDA0001177446820000041
wherein X is the storage capacity of the directory, N is the number of the at least two packets, NiThe number of metadata stored in the ith packet of the at least two packets.
Further, the first number is the minimum number of the storage capacities of any currently existing virtual directory within the first threshold range through the migration processing.
Further, the first monitoring unit is further configured to trigger the processing unit when it is monitored that the storage capacity of any virtual directory is not within the first threshold range.
Further, the packet migration rule includes: when all groups in all groups currently exist are arranged in sequence and all virtual directories in all virtual directories currently exist are arranged in sequence, the number of the groups corresponding to each virtual directory is determined according to a formula II;
when the number of all the currently existing virtual directories is the first number, the at least two groups are respectively migrated to each currently existing virtual directory according to the determined number of the groups corresponding to each virtual directory, wherein for any two virtual directories, when the arrangement sequence of the first virtual directory is behind the arrangement sequence of the second virtual directory, the arrangement sequence of each group in the first virtual directory is behind the arrangement sequence of each group in the second virtual directory;
when the number of all the currently existing virtual directories is not the first number, sequentially executing for each virtual directory based on the arrangement sequence of the virtual directories: calculating the difference value of the number of the groups corresponding to the virtual directory minus the determined number of the groups corresponding to the virtual directory; when the difference value is judged to be a positive number, determining the to-be-migrated packets corresponding to the virtual directory, wherein the arrangement sequence of the to-be-migrated packets is positioned behind other packets in all the packets corresponding to the virtual directory, and the number of the to-be-migrated packets is equal to the difference value; migrating the to-be-migrated packets to a next virtual directory sequentially arranged behind the virtual directory;
the second formula includes:
Figure BDA0001177446820000051
wherein X is the number of all groups currently existing; n is the number of all the virtual directories which currently exist; x is the number of1The number of the groups corresponding to any virtual directory except the last virtual directory in all the currently existing virtual directories, x2And the number of the groups corresponding to the last bit virtual directory.
Further, the apparatus for managing metadata further includes: a second monitoring unit and a grouping establishing unit;
the second monitoring unit is used for triggering the grouping establishing unit when the grouping unbalance degree of the last virtual directory and the virtual directory adjacent to the last virtual directory is not within a preset second threshold range;
the grouping establishing unit is used for establishing at least one grouping corresponding to the last-bit virtual directory so that the grouping unbalance degree is within the second threshold range, and triggering the determining unit;
the determining unit is further configured to, when receiving the trigger signal sent by the packet establishing unit, re-determine, for all packets currently existing, a mapping range corresponding to each packet, and trigger the mapping unit;
the mapping unit is further configured to, when receiving the trigger signal sent by the determining unit, execute, for each file, the following operations on the metadata corresponding to each file: and judging whether the grouping where the metadata is currently located is the same as the grouping corresponding to the mapping range matched with the metadata, and if not, migrating the metadata to the grouping corresponding to the mapping range matched with the metadata.
Further, the number of the at least one group established by the group establishing unit is the minimum number that is satisfied so that the group imbalance degree is within the second threshold range.
Further, the grouping imbalance degree conforms to formula three, wherein,
the third formula includes:
Figure BDA0001177446820000061
wherein Y is the packet imbalance, XnFor the number of packets, X, corresponding to the last-bit virtual directorymThe number of packets corresponding to the virtual directory adjacent to the last bit virtual directory.
The invention provides a method and a device for managing metadata, aiming at least two groups corresponding to a directory, determining a mapping range corresponding to each group; respectively storing the metadata corresponding to each file into the groups corresponding to the mapping ranges matched with the metadata; when the storage capacity of the directories is monitored not to be within the preset threshold range, a first number of virtual directories are established, and corresponding migration processing is carried out on all the currently existing groups according to a group migration rule. The massive metadata are respectively stored on the basis of the grouping, so when the storage capacity of the directory exceeds the limit, the metadata can be stored in a grouping mode through the virtual directory by adding the virtual directory and transferring the grouping, and the problem that the storage capacity of the directory exceeds the limit can be solved. Therefore, the invention can improve the search efficiency of the metadata.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of a method of managing metadata according to an embodiment of the present invention;
FIG. 2 is a diagram of a framework for managing metadata according to an embodiment of the present invention;
FIG. 3 is a flow chart of another method for managing metadata provided by an embodiment of the present invention;
FIG. 4 is a diagram illustrating an apparatus for managing metadata according to an embodiment of the present invention;
fig. 5 is a schematic diagram of another apparatus for managing metadata according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides a method for managing metadata, which may include the following steps:
step 101: and determining the mapping range corresponding to each group aiming at least two groups corresponding to the catalogue.
Step 102: executing the following steps aiming at the metadata corresponding to each file: the metadata is stored in the grouping corresponding to the mapping range matched with the metadata.
Step 103: when the storage capacity of the directory is not monitored to be within the preset first threshold range, step 104 is executed.
Step 104: establishing a first number of virtual directories, and performing corresponding migration processing on all the currently existing groups according to a predetermined group migration rule.
The embodiment of the invention provides a method for managing metadata, which aims at least two groups corresponding to a directory and determines a mapping range corresponding to each group; respectively storing the metadata corresponding to each file into the groups corresponding to the mapping ranges matched with the metadata; when the storage capacity of the directories is monitored not to be within the preset threshold range, a first number of virtual directories are established, and corresponding migration processing is carried out on all the currently existing groups according to a group migration rule. The massive metadata are respectively stored on the basis of the grouping, so when the storage capacity of the directory exceeds the limit, the metadata can be stored in a grouping mode through the virtual directory by adding the virtual directory and transferring the grouping, and the problem that the storage capacity of the directory exceeds the limit can be solved. Therefore, the embodiment of the invention can improve the metadata retrieval efficiency.
In detail, each file in the file system may include two parts, namely, the metadata corresponding to the file and the content of the file. In the embodiment of the invention, the metadata of each file can be managed in a unified way through the directory.
In detail, for a directory of a file system, the directory may correspond to a plurality of groups, and mapping ranges corresponding to the groups may be different, so that each metadata may be mapped to a corresponding group for storage and management.
In detail, the metadata corresponding to the file may include a name, time, authority, extended attribute, storage location, and the like of the file.
Therefore, in an embodiment of the present invention, in order to illustrate a possible implementation of mapping metadata to a corresponding group, the name of each file is included in the metadata corresponding to the file;
the step 102 includes: executing the following steps aiming at the metadata corresponding to each file: calculating a hash value corresponding to the name according to the name of the file included in the hash value; determining a target mapping range matched with the hash value, wherein the hash value is located in the target mapping range; and storing the metadata into the grouping corresponding to the target mapping range.
For example, suppose that there are 10 packets currently corresponding to the directory of the file system, and the mapping ranges corresponding to these 10 packets are [0, 10], (10, 20], … …, (90, 100] in order, for the metadata corresponding to any file, the corresponding hash value can be calculated according to the name of the file included therein, suppose that for a certain metadata, the calculated hash value is 17, since 17 can fall into the mapping range corresponding to the second packet of the above 10 packets, (10, 20], which indicates that the metadata matches with the mapping range (10, 20], the metadata can be mapped into the second packet for storage.
As metadata is continuously generated, metadata is continuously stored in corresponding groups, and thus the amount of metadata stored in each group corresponding to a directory is continuously increased. When the increase reaches a certain level, it may cause the storage capacity of the directory to be exceeded. When the storage capacity of the directory exceeds the limit, the execution of operations such as metadata retrieval and cache management is not facilitated, so that the storage pressure of the directory can be shared by increasing the implementation mode of the virtual directory.
In detail, by adding the virtual directory, all the groups corresponding to the directory can be migrated to the newly added virtual directory, so as to relieve the storage pressure of the directory. When the storage capacity of a directory exceeds the limit, if only one virtual directory is added and all the packets corresponding to the directory are migrated to the virtual directory, the storage capacity of the virtual directory also exceeds the limit. Therefore, when the storage capacity of the directory is over-limited, i.e. the virtual directory needs to be established for the first time, at least two virtual directories need to be added by default. Thus, the first number in step 104 described above may be at least two.
For example, the directory currently corresponds to 10 packets, if the storage capacity of the directory is exceeded, two virtual directories may be newly added, and the 10 packets are respectively migrated into the two virtual directories, so that each virtual directory corresponds to 5 packets. Because 10 packets corresponding to the directory are migrated, the storage pressure of the directory can be greatly relieved, and the problem that the storage capacity of the directory is over-limited is solved. Meanwhile, the storage capacity of each virtual directory is not currently over-limit.
Correspondingly, when the establishment of any virtual directory is completed, the metadata corresponding to the virtual directory is generated. In an embodiment of the present invention, preferably, the metadata corresponding to the generated virtual directory may be stored in the directory, so that the virtual directories are managed in a unified manner through the directory. Wherein, each virtual directory established can be used as a subdirectory of the directory to be managed by the directory.
Based on the above, in an embodiment of the present invention, preferably, the first number is a minimum number that the storage capacity of any currently existing virtual directory is located within the first threshold range through the migration process.
For example, as the metadata is continuously stored, assuming that the current storage capacity of the directory has reached 12 ten thousand, the first threshold range is set as follows: less than or equal to 10 ten thousand entries indicate that the current storage capacity of the directory is over-limit, so at least two virtual directories need to be added. For example, after adding two virtual directories and performing group migration to divide the groups equally, the storage capacity of each newly added virtual directory may be about 6 ten thousand, and the storage capacities of both virtual directories are not over-limit.
In general, according to the cloud data storage rate in actual application, the storage capacity of the directory exceeds the limit, that is, when the virtual directory is first established, the minimum number of the virtual directories to be added is usually 2.
Of course, for some limits or special cases, if the metadata storage amount at a certain time is too much, for example, the current storage capacity of the directory has reached 22 ten thousand, if two virtual directories are added and packet migration is performed to divide the packets equally, the storage capacity of each newly added virtual directory may be about 11 thousand, both storage capacities are exceeded, or at least one virtual directory is exceeded. Therefore, the minimum number of virtual directories to be added should be 3.
Therefore, based on the above, it can be seen that the minimum number of the first number is 2 when the virtual directory is first established.
Generally speaking, when the storage capacity of any virtual directory exceeds the limit, the number of the newly established virtual directories should be able to ensure that the storage capacity of each virtual directory in the file system does not exceed the limit after the packet migration operation is executed.
In one embodiment of the present invention, the storage capacity conforms to the following formula (1);
Figure BDA0001177446820000101
wherein X is the storage capacity of the directory, N is the number of the at least two packets, NiThe number of metadata stored in the ith packet of the at least two packets.
It can be seen that the storage capacity of the directory may be the sum of the number of metadata stored in each group corresponding to the directory.
Correspondingly, based on the same implementation principle, as in the above formula (1), in an embodiment of the present invention, for any virtual directory, the storage capacity of the virtual directory may be the sum of the numbers of metadata stored in each group corresponding to the virtual directory.
After the virtual directory is newly created, massive metadata can be stored in groups through the virtual directory. Of course, as new metadata is continuously stored, the storage capacity of each virtual directory is also overrun.
From the above, when the storage capacity of the directory exceeds the limit, the problem of the storage capacity of the directory exceeding can be solved by adding the virtual directory and performing the packet migration. Similarly, when the storage capacity of any virtual directory exceeds the limit, the problem of the storage capacity exceeding can be solved by adding the virtual directories and performing packet migration. Wherein, aiming at the problem of the storage capacity overrun each time, at least one virtual directory can be newly established. Thus, when the virtual directory is not established for the first time, the minimum number of the first numbers is 1.
Therefore, in an embodiment of the present invention, after step 104, further comprising: when it is detected that the storage capacity of any virtual directory does not lie within the first threshold range, step 104 is executed.
In detail, as long as the storage capacity of any virtual directory is over-limit, step 104 may be executed to add a new virtual directory and perform packet migration, so that the storage capacity of each virtual directory is not over-limit.
In detail, for any virtual directory, by executing step 104, a part of packets in the virtual directory can be migrated to reduce the number of corresponding packets, so that the storage capacity of the virtual directory can be reduced, thereby solving the problem of the storage capacity of the virtual directory being over-limited.
In an embodiment of the present invention, to illustrate a possible implementation manner of packet migration, the packet migration rule includes: when all groups in all groups currently exist are arranged in sequence and all virtual directories in all virtual directories currently exist are arranged in sequence, the number of the groups corresponding to each virtual directory is determined according to the following formula (2);
when the number of all the currently existing virtual directories is the first number, the at least two groups are respectively migrated to each currently existing virtual directory according to the determined number of the groups corresponding to each virtual directory, wherein for any two virtual directories, when the arrangement sequence of the first virtual directory is behind the arrangement sequence of the second virtual directory, the arrangement sequence of each group in the first virtual directory is behind the arrangement sequence of each group in the second virtual directory;
when the number of all the currently existing virtual directories is not the first number, sequentially executing for each virtual directory based on the arrangement sequence of the virtual directories: calculating the difference value of the number of the groups corresponding to the virtual directory minus the determined number of the groups corresponding to the virtual directory; when the difference value is judged to be a positive number, determining the to-be-migrated packets corresponding to the virtual directory, wherein the arrangement sequence of the to-be-migrated packets is positioned behind other packets in all the packets corresponding to the virtual directory, and the number of the to-be-migrated packets is equal to the difference value; migrating the to-be-migrated packets to a next virtual directory sequentially arranged behind the virtual directory;
Figure BDA0001177446820000121
wherein X is the number of all groups currently existing; n is the number of all the virtual directories which currently exist; x is the number of1The number of the groups corresponding to any virtual directory except the last virtual directory in all the currently existing virtual directories, x2And the number of the groups corresponding to the last bit virtual directory.
In one embodiment of the invention, FIG. 2 may be represented as a framework for managing metadata. The framework may be represented as a directory of a file system, and currently 20 packets, virtual directory 1 and virtual directory 2, are included under the directory. Here, the virtual directory 1 corresponds to the group 1 to the group 10, and the virtual directory 2 corresponds to the group 11 to the group 20. Furthermore, for each packet, metadata (not shown in fig. 2) is stored therein that matches it.
In detail, referring to fig. 2, for the packet migration rule, for example: assume that there are 20 packets in the directory of the file system, packet 1 to packet 20, respectively, and that the 20 packets initially each correspond to a directory. When the storage capacity of a directory is exceeded, typically 2 virtual directories may be established: virtual directory 1 and virtual directory 2. In detail, for each virtual directory created, the directory is uniformly managed.
In fig. 2, the 20 packets are arranged in order, and the virtual directory 1 is a first-bit virtual directory and the virtual directory 2 is a second-bit virtual directory arranged in order. According to the above formula (2), the grouping policy may be: virtual directory 1 corresponds to 19 groupings and virtual directory 2 corresponds to 1 grouping, virtual directory 1 corresponds to 18 groupings and virtual directory 2 corresponds to 2 groupings, … …, virtual directory 1 corresponds to 10 groupings and virtual directory 2 corresponds to 10 groupings.
Since the amount of metadata stored in each group is usually not very different, no matter any grouping strategy is adopted, the storage capacity of the virtual directory 1 and the virtual directory 2 can be ensured not to be exceeded after the group migration. However, when the number of the groups corresponding to the two is too large, the grouping imbalance of the two is high, which affects the metadata management stability of the whole file system.
Therefore, it is preferable to select the grouping strategy with the smallest difference between the numbers of the groups corresponding to the two, that is, the virtual directory 1 and the virtual directory 2 each correspond to 10 groups. Of course, based on different actual application requirements, other grouping strategies may also be adopted, so that the storage capacity of each virtual directory does not exceed the limit after the grouping migration.
For the grouping policy that the virtual directory 1 and the virtual directory 2 each correspond to 10 groups, the groups 1 to 10 may be migrated from the directory into the virtual directory 1, and the groups 11 to 20 may be migrated from the directory into the virtual directory 2. The post-migration situation may correspond to fig. 2.
Thus, there are currently two virtual directories in the file system, virtual directory 1 and virtual directory 2. With the metadata stored, step 104 may be performed again when there is an excess of storage capacity for any virtual directory. Thus, a virtual directory can be created again: the virtual directory 3 is a third-bit virtual directory sequentially arranged after the virtual directory 2. According to the above formula (2), the grouping policy may be: the number of the groups corresponding to the three sequentially arranged virtual directories is sequentially 9, 9 and 2, or sequentially 8, 8 and 4, or sequentially 7, 7 and 6.
In the embodiment of the present invention, in view of improving the grouping imbalance among different virtual directories as much as possible, the number of groups corresponding to three virtual directories that are sequentially arranged may be preferably 7, and 6 in turn.
Thus, based on the order of the virtual directories, first for the first virtual directory: the number of the currently corresponding groups of the virtual directory 1 is 10, the number of the determined corresponding groups is 7, and the difference between the two is 3, so that it can be determined that 3 to-be-migrated groups exist in the virtual directory 1: packet 8-packet 10, so the 3 packets to be migrated can be migrated from virtual directory 1 to the second-bit virtual directory: in virtual directory 2.
Then, for the second-bit virtual directory: because the virtual directory 2 has completed the migration of the groups 8 to 10, the groups corresponding to the virtual directory 2 currently are the groups 8 to 20, the number of the groups is 13, the determined number of the groups corresponding to the virtual directory 2 is 7, and the difference between the two is 6, so that it can be determined that there are 6 groups to be migrated in the virtual directory 2: the 6 packets to be migrated can be migrated from the virtual directory 2 to the third-bit virtual directory: in the virtual directory 3.
Subsequently, for the third bit virtual directory: since the virtual directory 3 has completed the migration of the packets 15 to 20, the packets corresponding to the virtual directory 3 are the packets 15 to 20, the number of the packets is 6, the determined number of the packets corresponding to the virtual directory 3 is 6, and the difference between the two is 0, so that it can be determined that no packet to be migrated exists in the virtual directory 3, and the packet migration process can be ended.
As described above, with the metadata stored, step 104 may be executed again when the storage capacity of any virtual directory in the file system exceeds the limit. Thus, a virtual directory can be created: virtual directory 4. Then, after the execution of the packet migration is completed, the virtual directory 1 may correspond to the packet 1 to the packet 5, the virtual directory 2 may correspond to the packet 6 to the packet 10, the virtual directory 3 may correspond to the packet 11 to the packet 15, and the virtual directory 4 may correspond to the packet 16 to the packet 20.
In this loop, as long as it is monitored that the storage capacity of any virtual directory exceeds the storage capacity of all currently existing virtual directories, step 104 can be repeatedly executed to ensure that the storage capacity of any virtual directory does not exceed the storage capacity of any virtual directory, thereby ensuring the smooth execution of operations such as metadata retrieval and cache management.
Based on the above, after the packet migration is completed, the number of packets of the last-bit virtual directory and the virtual directory adjacent to the last-bit virtual directory may be different. For example, when there are three virtual directories currently, the number of groups corresponding to the three virtual directories may be 7, and 6 in sequence. Since the metadata management stability of the whole file system is affected when the grouping imbalance between two virtual directories is high, the high grouping imbalance can be adjusted by adding the grouping.
Therefore, in an embodiment of the present invention, in order to illustrate a possible implementation manner of adjusting the packet imbalance, the method further includes, after step 104: when the packet unbalance degree of the last virtual directory and the virtual directories adjacent to the last virtual directory is monitored not to be within a preset second threshold range, establishing at least one packet corresponding to the last virtual directory so that the packet unbalance degree is within the second threshold range; for all the groups which exist currently, the mapping range corresponding to each group is determined again; executing the following steps aiming at the metadata corresponding to each file: and judging whether the grouping where the metadata is currently located is the same as the grouping corresponding to the mapping range matched with the metadata, and if not, migrating the metadata to the grouping corresponding to the mapping range matched with the metadata.
Based on the above, for example, when there are three virtual directories currently, the number of groups corresponding to the three virtual directories may be 7, and 6 in sequence. The first two virtual packets each correspond to 7 packets, so their packet imbalance is qualified, while there may be a packet imbalance between the last two virtual packets. Thus, a new group can be added, and the new group corresponds to the last virtual directory, so that the last virtual directory also corresponds to 7 groups, thereby adjusting the group imbalance between the last virtual directory and the adjacent virtual directory.
For the newly established packet, the mapping range corresponding to the newly established packet needs to be determined, so that the mapping range of each packet currently existing in the file system needs to be adjusted. For example, there are currently 20 packets, which are packet 1 to packet 20, and the mapping ranges corresponding to these 20 packets are [0, 10], (10, 20), … …, (190, 200] in sequence, after a new packet is added, there are currently 21 packets, so the total mapping range of [0, 200] can be reallocated.
For example, the mapping ranges corresponding to the 21 packets may be [0, 10], (10, 20), … …, (100, 110], (110, 119], (119, 128), … …, (191, 200] in this order, it can be seen that the mapping ranges of packets 1 to 11 are not changed, but the mapping ranges of packets 12 to 20 are changed, for example, the mapping range of packet 13 is changed from (120, 130) to (119, 128), and an added packet sequence is packet 21, which may correspond to mapping range (191, 200).
Taking the above packet 13 as an example, it is assumed that for a certain metadata, the hash value can be calculated to be 129, 129 can fall within (120, 130), and the metadata can be mapped to the packet 13, however, after the mapping range of the packet 13 becomes (119, 128), since the calculated hash value remains 129 as it is, it can be seen that 129 can fall within the mapping range (128, 137) corresponding to the packet 14, and it can be determined that the metadata corresponds to the packet 14.
Based on the same implementation principle, corresponding migration operation can be performed on each metadata to be migrated. Thus, the newly established packet: a certain amount of metadata is also stored in the group 21, so that the amount of metadata stored in different groups of all the current groups is not greatly different, and thus, the storage capacity of each virtual directory can be relieved, and the group imbalance among different virtual directories can be reduced.
Of course, for each new metadata, each new metadata may be mapped to a corresponding group for storage according to the current mapping range of each group.
In an embodiment of the present invention, the number of the at least one created packet is the minimum number that is met so that the packet imbalance degree is within the second threshold range.
For example, assuming that the number of packets corresponding to the last virtual directory is 8, the number of packets corresponding to the virtual directory adjacent to the last virtual directory is 10, and the packet disparity between the two virtual directories is 20%, the set second threshold range is not satisfied: less than or equal to 10%, so 1 or 2 groups can be added, correspondingly, the number of the groups corresponding to the last bit virtual directory is 9 or 10, and the group imbalance of the two groups is 10% or 0%, so as to satisfy the second threshold range: less than or equal to 10 percent. Thus, since 1 or 2 packets can be established, the minimum value can be taken, i.e. 1 packet is established.
In one embodiment of the present invention, to illustrate one possible implementation of calculating the packet imbalance, the packet imbalance therefore conforms to the following equation (3);
Figure BDA0001177446820000161
wherein Y is the packet imbalance, XnFor the number of packets, X, corresponding to the last-bit virtual directorymThe number of packets corresponding to the virtual directory adjacent to the last bit virtual directory.
In detail, the lower the grouping imbalance, the closer the number of the groupings corresponding to the two virtual directories is, which is beneficial to ensuring the metadata management stability of the whole file system.
In summary, the embodiment of the present invention provides a method for managing metadata, which manages mass metadata through different groups corresponding to virtual directories by dynamically creating new virtual directories in real time, so as to avoid that all metadata are always stored in the directories, thereby improving the efficiency of retrieving mass metadata. When the storage capacity of each virtual directory exceeds the limit, the problem can be solved through the implementation modes of newly building the virtual directory and migrating the groups, and when the groups among different virtual directories are unbalanced, the problem can be solved through the implementation modes of newly building the groups and migrating the metadata, so that the whole file system can be dynamically expanded according to actual application.
As shown in fig. 3, an embodiment of the present invention provides another method for managing metadata, which specifically includes the following steps:
step 301: and determining the mapping range corresponding to each group aiming at the 20 groups corresponding to the catalogue.
In detail, 20 packets in a sequential order currently exist in the file system, and each of the 20 packets corresponds to a directory, and is respectively a packet 1 to a packet 20, and mapping ranges corresponding to the 20 packets are [0, 10], (10, 20), … …, (190, 200) in sequence.
Step 302: executing the following steps aiming at the metadata corresponding to each file: calculating a hash value corresponding to the name according to the name of the file contained in the hash value; determining a target mapping range matched with the hash value, wherein the hash value is positioned in the target mapping range; and storing the metadata into the grouping corresponding to the target mapping range.
In detail, the metadata corresponding to each file includes the name of the file. For example, the metadata corresponding to a file may include the name, time, permissions, extended attributes, storage location, etc. of the file.
For example, assuming that for some metadata, the computed hash value is 17, since 17 may fall into the second packet of the above 20 packets: the mapping range (10, 20) corresponding to the group 2 shows that the metadata is matched with the mapping range (10, 20), so that the metadata can be mapped into the group 2 for storage.
Step 303: when the storage capacity of the directory is not detected to be within the preset first threshold range, step 304 is executed.
In detail, the storage capacity can be obtained by calculation by the above formula (1).
With the continuous storage of metadata, assuming that the storage capacity of the directory is calculated to be 11 ten thousand, which is not within the first threshold range: less than or equal to 10 ten thousand.
Step 304: establishing a first number of virtual directories, and performing corresponding migration processing on all the currently existing groups according to a predetermined group migration rule.
In detail, the first number may be a minimum number that is processed by migration so that the storage capacity of any currently existing virtual directory is within the first threshold range.
Based on the above, it can be seen that the minimum number of the first number is 2. Thus, 2 virtual directories can be established: virtual directory 1 and virtual directory 2. The virtual directory 2 is sequentially arranged after the virtual directory 1. Therefore, virtual directory 1 is the first-bit virtual directory and virtual directory 2 is the second-bit virtual directory. If a virtual directory is newly created again, the sequence of the newly created virtual directory is used as a third-bit virtual directory. And the process is circulated.
In detail, after each virtual directory is established, metadata corresponding to the virtual directory may be generated. The generated metadata corresponding to each virtual directory can be stored in the directory so as to be managed by the directory in a unified manner.
In detail, the packet migration rule may be: when all groups in all groups currently exist are arranged in sequence and all virtual directories in all virtual directories currently exist are arranged in sequence, determining the number of the groups corresponding to each virtual directory according to the formula (2);
when the number of all the currently existing virtual directories is the first number, the at least two groups are respectively migrated to each currently existing virtual directory according to the determined number of the groups corresponding to each virtual directory, wherein for any two virtual directories, when the arrangement sequence of the first virtual directory is behind the arrangement sequence of the second virtual directory, the arrangement sequence of each group in the first virtual directory is behind the arrangement sequence of each group in the second virtual directory;
when the number of all the currently existing virtual directories is not the first number, sequentially executing for each virtual directory based on the arrangement sequence of the virtual directories: calculating the difference value of the number of the groups corresponding to the virtual directory minus the determined number of the groups corresponding to the virtual directory; when the difference value is judged to be a positive number, determining the to-be-migrated packets corresponding to the virtual directory, wherein the arrangement sequence of the to-be-migrated packets is positioned behind other packets in all the packets corresponding to the virtual directory, and the number of the to-be-migrated packets is equal to the difference value; and migrating the packet to be migrated to the next virtual directory which is sequentially arranged behind the virtual directory.
In the embodiment of the present invention, there are two virtual directories: virtual directory 1 and virtual directory 2. The number of all virtual directories currently present is 2, equal to the first number: 2, that is, it is described that all the currently existing virtual directories are the first establishment of the virtual directory, so according to the above-mentioned packet migration rule, the packet 1 to the packet 10 can be migrated from the directory into the virtual directory 1, and the packet 11 to the packet 20 can be migrated from the directory into the virtual directory 2.
Step 305: when it is detected that the storage capacity of any virtual directory does not fall within the first threshold range, step 304 is executed, and step 306 is executed.
Correspondingly, based on the same implementation principle, as in the above formula (1), in an embodiment of the present invention, for any virtual directory, the storage capacity of the virtual directory may be the sum of the numbers of metadata stored in each group corresponding to the virtual directory.
For all the virtual directories currently existing in the file system, if the storage capacity of any virtual directory exceeds the limit, step 304 may be executed to create a new virtual directory and perform corresponding packet migration, so as to solve the problem of storage capacity exceeding.
For example, if it is detected that virtual directory 1 is overrun, a virtual directory can be created: virtual directory 3. At this time, the number of all virtual directories currently existing is 3, which is not equal to the first number: 1, i.e. it means that all the currently existing virtual directories are not the first creation of a virtual directory.
In this way, according to the packet migration rule, the packets 8 to 10 can be migrated from the virtual directory 1 to the virtual directory 2, and the packets 15 to 20 can be migrated from the virtual directory 2 to the virtual directory 3. In this case, the numbers of packets corresponding to the three virtual directories are 7, and 6, respectively.
Similarly, no matter how many virtual directories exist currently, as long as it is monitored that the storage capacity of any virtual directory exceeds the limit, step 304 can be repeatedly executed to create a new virtual directory and perform corresponding group migration, so as to solve the problem of storage capacity exceeding.
Step 306: when the packet unbalance degree of the last virtual directory and the virtual directories adjacent to the last virtual directory is monitored not to be within a preset second threshold range, at least one packet corresponding to the last virtual directory is established, so that the packet unbalance degree is within the second threshold range.
In detail, the packet unbalance degree may be obtained by calculating through the above equation (3).
In detail, the number of the at least one packet established may be a minimum number that is met such that the packet imbalance is within the second threshold range.
Assume that there are three virtual directories currently, and the number of groups corresponding to the three virtual directories is 7, and 6, respectively. The grouping imbalance of the virtual directory 2 and the virtual directory 3 is calculated to be 14.2%, and is not in the second threshold range: less than 10%, so a packet corresponding to virtual directory 3 can be created: and (6) grouping 21.
Step 307: and re-determining the mapping range corresponding to each group in all the groups currently existing.
After adding one new packet, there are currently 21 packets, so the total mapping range of [0, 200] can be reallocated. For example, the mapping ranges corresponding to the 21 packets may be [0, 10], (10, 20), … …, (100, 110], (110, 119], (119, 128), … …, (191, 200) in sequence.
Step 308: executing the following steps aiming at the metadata corresponding to each file: and judging whether the group where the metadata is currently located is the same as the group corresponding to the mapping range matched with the metadata, if so, ending the current flow, and otherwise, transferring the metadata to the group corresponding to the mapping range matched with the metadata.
Taking packet 13 as an example, assume that for some metadata, a hash value of 129 can be calculated, indicating that it is currently present in packet 13. However, after the mapping range of packet 13 becomes (119, 128), it can be seen that 129 may fall within the mapping range (128, 137) corresponding to packet 14, since the computed hash value remains 129, as such, the metadata may be migrated from packet 13 into packet 14.
Based on the same implementation principle, corresponding migration operation can be performed on each metadata to be migrated.
Of course, for each new metadata, each new metadata may be mapped to a corresponding group for storage according to the current mapping range of each group.
As shown in fig. 4, an embodiment of the present invention provides an apparatus for managing metadata, including:
a determining unit 401, configured to determine, for at least two groups corresponding to a directory, a mapping range corresponding to each of the groups;
a mapping unit 402, configured to perform, for each file, corresponding metadata: storing the metadata into the group corresponding to the mapping range matched with the metadata, and triggering the first monitoring unit 403;
the first monitoring unit 403 is configured to trigger the processing unit 404 when it is monitored that the storage capacity of the directory is not within a preset first threshold range;
the processing unit 404 is configured to establish a first number of virtual directories, and perform corresponding migration processing on all currently existing packets according to a predetermined packet migration rule.
In an embodiment of the present invention, the metadata corresponding to each file includes a name of the file;
the mapping unit 402 is specifically configured to execute, for each file, corresponding metadata: calculating a hash value corresponding to the name according to the name of the file included in the hash value; determining a target mapping range matched with the hash value, wherein the hash value is located in the target mapping range; and storing the metadata into the grouping corresponding to the target mapping range.
In one embodiment of the present invention, the storage capacity of the directory conforms to equation (1) above.
In an embodiment of the present invention, the first number is a minimum number that is obtained by the migration process so that the storage capacity of any currently existing virtual directory is within the first threshold range.
In an embodiment of the present invention, the first monitoring unit 403 is further configured to trigger the processing unit 404 when it is detected that the storage capacity of any virtual directory is not within the first threshold range.
In an embodiment of the present invention, the packet migration rule includes: when all groups in all groups currently exist are arranged in sequence and all virtual directories in all virtual directories currently exist are arranged in sequence, determining the number of the groups corresponding to each virtual directory according to the formula (2);
when the number of all the currently existing virtual directories is the first number, the at least two groups are respectively migrated to each currently existing virtual directory according to the determined number of the groups corresponding to each virtual directory, wherein for any two virtual directories, when the arrangement sequence of the first virtual directory is behind the arrangement sequence of the second virtual directory, the arrangement sequence of each group in the first virtual directory is behind the arrangement sequence of each group in the second virtual directory;
when the number of all the currently existing virtual directories is not the first number, sequentially executing for each virtual directory based on the arrangement sequence of the virtual directories: calculating the difference value of the number of the groups corresponding to the virtual directory minus the determined number of the groups corresponding to the virtual directory; when the difference value is judged to be a positive number, determining the to-be-migrated packets corresponding to the virtual directory, wherein the arrangement sequence of the to-be-migrated packets is positioned behind other packets in all the packets corresponding to the virtual directory, and the number of the to-be-migrated packets is equal to the difference value; and migrating the packet to be migrated to the next virtual directory which is sequentially arranged behind the virtual directory.
In an embodiment of the present invention, referring to fig. 5, the apparatus for managing metadata may further include: a second monitoring unit 501, a grouping establishing unit 502;
the second monitoring unit 501 is configured to trigger the grouping establishing unit 502 when it is monitored that the grouping imbalance between the last virtual directory and the virtual directory adjacent to the last virtual directory is not within a preset second threshold range;
the grouping establishing unit 502 is configured to establish at least one grouping corresponding to the last-bit virtual directory, so that the grouping imbalance is within the second threshold range, and trigger the determining unit 401;
the determining unit 401 is further configured to, when receiving the trigger signal sent by the packet establishing unit 502, re-determine, for all currently existing packets, a mapping range corresponding to each packet, and trigger the mapping unit 402;
the mapping unit 402 is further configured to, when receiving the trigger signal sent by the determining unit 401, execute, for each file, the following steps for the metadata corresponding to each file: and judging whether the grouping where the metadata is currently located is the same as the grouping corresponding to the mapping range matched with the metadata, and if not, migrating the metadata to the grouping corresponding to the mapping range matched with the metadata.
In one embodiment of the present invention, the number of the at least one group established by the group establishing unit 502 is the minimum number that is satisfied to make the group imbalance be within the second threshold range.
In one embodiment of the present invention, the packet imbalance corresponds to equation (3) above.
Because the information interaction, execution process, and other contents between the units in the device are based on the same concept as the method embodiment of the present invention, specific contents may refer to the description in the method embodiment of the present invention, and are not described herein again.
In summary, the embodiments of the present invention have at least the following advantages:
1. in the embodiment of the invention, aiming at least two groups corresponding to a directory, a mapping range corresponding to each group is determined; respectively storing the metadata corresponding to each file into the groups corresponding to the mapping ranges matched with the metadata; when the storage capacity of the directories is monitored not to be within the preset threshold range, a first number of virtual directories are established, and corresponding migration processing is carried out on all the currently existing groups according to a group migration rule. The massive metadata are respectively stored on the basis of the grouping, so when the storage capacity of the directory exceeds the limit, the metadata can be stored in a grouping mode through the virtual directory by adding the virtual directory and transferring the grouping, and the problem that the storage capacity of the directory exceeds the limit can be solved. Therefore, the embodiment of the invention can improve the metadata retrieval efficiency.
2. In the embodiment of the invention, the method for managing the metadata is provided, the massive metadata is managed through the real-time dynamic new virtual directories and different groups corresponding to the virtual directories, and all the metadata are prevented from being stored in the directories all the time, so that the retrieval efficiency of the massive metadata can be improved. When the storage capacity of each virtual directory exceeds the limit, the problem can be solved through the implementation modes of newly building the virtual directory and migrating the groups, and when the groups among different virtual directories are unbalanced, the problem can be solved through the implementation modes of newly building the groups and migrating the metadata, so that the whole file system can be dynamically expanded according to actual application.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a" does not exclude the presence of other similar elements in a process, method, article, or apparatus that comprises the element.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it is to be noted that: the above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (6)

1. A method of managing metadata, comprising:
s1: aiming at least two groups corresponding to the catalog, determining a mapping range corresponding to each group;
s2: executing the following steps aiming at the metadata corresponding to each file: storing the metadata into a group corresponding to the mapping range matched with the metadata;
s3: when the storage capacity of the directory is monitored not to be within a preset first threshold range, executing S4;
s4: establishing a first number of virtual directories, and performing corresponding migration processing on all the currently existing groups according to a predetermined group migration rule;
the metadata corresponding to each file comprises the name of the file;
the S2, including: executing the following steps aiming at the metadata corresponding to each file: calculating a hash value corresponding to the name according to the name of the file included in the hash value; determining a target mapping range matched with the hash value, wherein the hash value is located in the target mapping range; storing the metadata into a group corresponding to the target mapping range;
and/or the presence of a gas in the gas,
the storage capacity of the directory conforms to equation one, wherein,
the first formula comprises:
Figure FDA0002228964490000011
wherein X is the storage capacity of the directory, N is the number of the at least two packets, NiThe number of the metadata stored in the ith group in the at least two groups;
and/or the presence of a gas in the gas,
the first number is the minimum number of the storage capacity of any currently existing virtual directory within the first threshold range through the migration processing;
and/or the presence of a gas in the gas,
further comprising after S4: when it is monitored that the storage capacity of any virtual directory does not lie within the first threshold range, executing S4;
the packet migration rule includes: when all groups in all groups currently exist are arranged in sequence and all virtual directories in all virtual directories currently exist are arranged in sequence, the number of the groups corresponding to each virtual directory is determined according to a formula II;
when the number of all the currently existing virtual directories is the first number, the at least two groups are respectively migrated to each currently existing virtual directory according to the determined number of the groups corresponding to each virtual directory, wherein for any two virtual directories, when the arrangement sequence of the first virtual directory is behind the arrangement sequence of the second virtual directory, the arrangement sequence of each group in the first virtual directory is behind the arrangement sequence of each group in the second virtual directory;
when the number of all the currently existing virtual directories is not the first number, sequentially executing for each virtual directory based on the arrangement sequence of the virtual directories: calculating the difference value of the number of the groups corresponding to the virtual directory minus the determined number of the groups corresponding to the virtual directory; when the difference value is judged to be a positive number, determining the to-be-migrated packets corresponding to the virtual directory, wherein the arrangement sequence of the to-be-migrated packets is positioned behind other packets in all the packets corresponding to the virtual directory, and the number of the to-be-migrated packets is equal to the difference value; migrating the to-be-migrated packets to a next virtual directory sequentially arranged behind the virtual directory;
the second formula includes:
Figure FDA0002228964490000021
wherein X is the number of all groups currently existing; n is the number of all the virtual directories which currently exist; x is the number of1The number of the groups corresponding to any virtual directory except the last virtual directory in all the currently existing virtual directories, x2And the number of the groups corresponding to the last bit virtual directory.
2. The method of claim 1,
further comprising after S4: when the packet unbalance degree of the last virtual directory and the virtual directories adjacent to the last virtual directory is monitored not to be within a preset second threshold range, establishing at least one packet corresponding to the last virtual directory so that the packet unbalance degree is within the second threshold range; for all the groups which exist currently, the mapping range corresponding to each group is determined again; executing the following steps aiming at the metadata corresponding to each file: and judging whether the grouping where the metadata is currently located is the same as the grouping corresponding to the mapping range matched with the metadata, and if not, migrating the metadata to the grouping corresponding to the mapping range matched with the metadata.
3. The method of claim 2,
the number of the established at least one grouping is the minimum number which is in accordance with the number so that the grouping imbalance degree is within the second threshold value range;
and/or the presence of a gas in the gas,
the degree of packet imbalance conforms to the formula three, wherein,
the third formula includes:
Figure FDA0002228964490000031
wherein Y is the packet imbalance, XnFor the number of packets, X, corresponding to the last-bit virtual directorymThe number of packets corresponding to the virtual directory adjacent to the last bit virtual directory.
4. An apparatus for managing metadata, comprising:
the determining unit is used for determining a mapping range corresponding to each group aiming at least two groups corresponding to the catalog;
a mapping unit, configured to perform, for each file, corresponding metadata: storing the metadata into a group corresponding to the mapping range matched with the metadata, and triggering a first monitoring unit;
the first monitoring unit is used for triggering the processing unit when the storage capacity of the catalog is monitored not to be within a preset first threshold range;
the processing unit is used for establishing a first number of virtual directories and performing corresponding migration processing aiming at all the currently existing groups according to a predetermined group migration rule;
the metadata corresponding to each file comprises the name of the file;
the mapping unit is specifically configured to execute, for each piece of metadata corresponding to each file: calculating a hash value corresponding to the name according to the name of the file included in the hash value; determining a target mapping range matched with the hash value, wherein the hash value is located in the target mapping range; storing the metadata into a group corresponding to the target mapping range;
and/or the presence of a gas in the gas,
the storage capacity of the directory conforms to equation one, wherein,
the first formula comprises:
Figure FDA0002228964490000041
wherein X is the storage capacity of the directory, N is the number of the at least two packets, NiThe number of the metadata stored in the ith group in the at least two groups;
and/or the presence of a gas in the gas,
the first number is the minimum number of the storage capacity of any currently existing virtual directory within the first threshold range through the migration processing;
and/or the presence of a gas in the gas,
the first monitoring unit is further configured to trigger the processing unit when it is monitored that the storage capacity of any virtual directory is not within the first threshold range;
the packet migration rule includes: when all groups in all groups currently exist are arranged in sequence and all virtual directories in all virtual directories currently exist are arranged in sequence, the number of the groups corresponding to each virtual directory is determined according to a formula II;
when the number of all the currently existing virtual directories is the first number, the at least two groups are respectively migrated to each currently existing virtual directory according to the determined number of the groups corresponding to each virtual directory, wherein for any two virtual directories, when the arrangement sequence of the first virtual directory is behind the arrangement sequence of the second virtual directory, the arrangement sequence of each group in the first virtual directory is behind the arrangement sequence of each group in the second virtual directory;
when the number of all the currently existing virtual directories is not the first number, sequentially executing for each virtual directory based on the arrangement sequence of the virtual directories: calculating the difference value of the number of the groups corresponding to the virtual directory minus the determined number of the groups corresponding to the virtual directory; when the difference value is judged to be a positive number, determining the to-be-migrated packets corresponding to the virtual directory, wherein the arrangement sequence of the to-be-migrated packets is positioned behind other packets in all the packets corresponding to the virtual directory, and the number of the to-be-migrated packets is equal to the difference value; migrating the to-be-migrated packets to a next virtual directory sequentially arranged behind the virtual directory;
the second formula includes:
Figure FDA0002228964490000051
wherein X is the number of all groups currently existing; n is the number of all the virtual directories which currently exist; x is the number of1The number of the groups corresponding to any virtual directory except the last virtual directory in all the currently existing virtual directories, x2And the number of the groups corresponding to the last bit virtual directory.
5. The apparatus for managing metadata according to claim 4,
further comprising: a second monitoring unit and a grouping establishing unit;
the second monitoring unit is used for triggering the grouping establishing unit when the grouping unbalance degree of the last virtual directory and the virtual directory adjacent to the last virtual directory is not within a preset second threshold range;
the grouping establishing unit is used for establishing at least one grouping corresponding to the last-bit virtual directory so that the grouping unbalance degree is within the second threshold range, and triggering the determining unit;
the determining unit is further configured to, when receiving the trigger signal sent by the packet establishing unit, re-determine, for all packets currently existing, a mapping range corresponding to each packet, and trigger the mapping unit;
the mapping unit is further configured to, when receiving the trigger signal sent by the determining unit, execute, for each file, the following operations on the metadata corresponding to each file: and judging whether the grouping where the metadata is currently located is the same as the grouping corresponding to the mapping range matched with the metadata, and if not, migrating the metadata to the grouping corresponding to the mapping range matched with the metadata.
6. The apparatus for managing metadata according to claim 5,
the number of the at least one grouping established by the grouping establishing unit is the minimum number which accords with the requirement so that the grouping imbalance degree is within the second threshold range;
and/or the presence of a gas in the gas,
the degree of packet imbalance conforms to the formula three, wherein,
the third formula includes:
Figure FDA0002228964490000061
wherein Y is the packet imbalance, XnFor the number of packets, X, corresponding to the last-bit virtual directorymThe number of packets corresponding to the virtual directory adjacent to the last bit virtual directory.
CN201611139129.6A 2016-12-12 2016-12-12 Method and device for managing metadata Active CN106777062B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611139129.6A CN106777062B (en) 2016-12-12 2016-12-12 Method and device for managing metadata

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611139129.6A CN106777062B (en) 2016-12-12 2016-12-12 Method and device for managing metadata

Publications (2)

Publication Number Publication Date
CN106777062A CN106777062A (en) 2017-05-31
CN106777062B true CN106777062B (en) 2020-03-10

Family

ID=58879884

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611139129.6A Active CN106777062B (en) 2016-12-12 2016-12-12 Method and device for managing metadata

Country Status (1)

Country Link
CN (1) CN106777062B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102497428A (en) * 2011-12-13 2012-06-13 方正国际软件有限公司 Remote storage system and method for remote storage thereof
CN103688257B (en) * 2012-11-27 2017-04-26 华为技术有限公司 Method and device for managing metadata
CN103916459A (en) * 2014-03-04 2014-07-09 南京邮电大学 Big data filing and storing system
CN103916467B (en) * 2014-03-25 2017-02-08 中国科学院计算技术研究所 Load transfer method and system in metadata cluster
CN106021462A (en) * 2016-05-17 2016-10-12 深圳市中博科创信息技术有限公司 File storage method of cluster file system and cluster file system

Also Published As

Publication number Publication date
CN106777062A (en) 2017-05-31

Similar Documents

Publication Publication Date Title
US10747618B2 (en) Checkpointing of metadata into user data area of a content addressable storage system
US10691373B2 (en) Object headers facilitating storage of data in a write buffer of a storage system
US11609883B2 (en) Processing device configured for efficient generation of compression estimates for datasets
US20200272542A1 (en) Storage system with snapshot generation control utilizing monitored differentials of respective storage volumes
US10831735B2 (en) Processing device configured for efficient generation of a direct mapped hash table persisted to non-volatile block memory
US10977216B2 (en) Processing device utilizing content-based signature prefix for efficient generation of deduplication estimate
CN110347651B (en) Cloud storage-based data synchronization method, device, equipment and storage medium
US20170344546A1 (en) Code dispersion hash table-based map-reduce system and method
US10356150B1 (en) Automated repartitioning of streaming data
WO2016171885A1 (en) Distributed processing of shared content
CN104077423A (en) Consistent hash based structural data storage, inquiry and migration method
CN104809182A (en) Method for web crawler URL (uniform resource locator) deduplicating based on DSBF (dynamic splitting Bloom Filter)
CN104391863A (en) Data storage method and device
CN103077197A (en) Data storing method and device
CN105989015B (en) Database capacity expansion method and device and method and device for accessing database
CN107391761B (en) Data management method and device based on repeated data deletion technology
CN112100185B (en) Indexing system and method for block chain data balance load
CN113254527B (en) Optimization method of distributed storage map data, electronic device and storage medium
CN108093024B (en) Classified routing method and device based on data frequency
US10996898B2 (en) Storage system configured for efficient generation of capacity release estimates for deletion of datasets
US9485309B2 (en) Optimal fair distribution among buckets of different capacities
CN114969061A (en) Distributed storage method and device for industrial time sequence data
CN107256130B (en) Data store optimization method and system based on Cuckoo Hash calculation
CN106777062B (en) Method and device for managing metadata
CN111143373A (en) Data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant