CN115563073A - Method and device for data processing of distributed metadata and electronic equipment - Google Patents

Method and device for data processing of distributed metadata and electronic equipment Download PDF

Info

Publication number
CN115563073A
CN115563073A CN202211170926.6A CN202211170926A CN115563073A CN 115563073 A CN115563073 A CN 115563073A CN 202211170926 A CN202211170926 A CN 202211170926A CN 115563073 A CN115563073 A CN 115563073A
Authority
CN
China
Prior art keywords
data
target
information
processed
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211170926.6A
Other languages
Chinese (zh)
Inventor
肖永玲
瞿天善
张旭明
王豪迈
胥昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xingchen Tianhe Technology Co ltd
Original Assignee
Beijing Xingchen Tianhe Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xingchen Tianhe Technology Co ltd filed Critical Beijing Xingchen Tianhe Technology Co ltd
Priority to CN202211170926.6A priority Critical patent/CN115563073A/en
Publication of CN115563073A publication Critical patent/CN115563073A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing

Abstract

The invention discloses a data processing method and device for distributed metadata and electronic equipment. Relates to the technical field of computers, and the method comprises the following steps: acquiring a plurality of pieces of data to be processed, wherein the data to be processed at least comprises data information and information of a plurality of file metadata corresponding to the data information, the data information represents position information of the data to be processed, and the position information at least comprises an identification code of the data to be processed and file names of a plurality of files corresponding to the file metadata; carrying out Hash calculation on data information of data to be processed to obtain a plurality of Hash values; determining a plurality of target nodes based on the plurality of hash values; and distributing file metadata information corresponding to the data to be processed to a plurality of target nodes. The invention solves the technical problem of unbalanced dynamic distribution of the file metadata in the prior art.

Description

Method and device for data processing of distributed metadata and electronic equipment
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for data processing of distributed metadata and electronic equipment.
Background
In the prior art, in order to implement load balancing of distributed file metadata, data is generally directly allocated to a certain service node through manual partitioning in a static sub-tree partitioning mode, and when load imbalance occurs, the data is manually reallocated by an administrator again. And the other method for realizing load balancing of distributed file metadata is to monitor the load of cluster nodes in real time in a dynamic sub-tree partitioning mode and dynamically adjust the sub-trees to be distributed in different nodes, but the method is only suitable for various abnormal scenes, and if migration of a large amount of data causes service jitter, performance is affected, the realization is complex, and the problem of weak stability exists.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a data processing method and device for distributed metadata and electronic equipment, which are used for at least solving the technical problem of unbalanced dynamic distribution of file metadata in the prior art.
According to an aspect of an embodiment of the present invention, there is provided a method for data processing of distributed metadata, including: acquiring a plurality of data to be processed, wherein the data to be processed at least comprises data information and information of a plurality of file metadata corresponding to the data information, the data information represents position information of the data to be processed, and the position information at least comprises an identification code of the data to be processed and file names of a plurality of files corresponding to the file metadata; performing hash calculation on the data information of the data to be processed to obtain a plurality of hash values; determining a plurality of target nodes based on the plurality of hash values; and distributing the file metadata information corresponding to the data to be processed to a plurality of target nodes.
Further, the method for data processing of distributed metadata further comprises: after file metadata information corresponding to the data to be processed is distributed to a plurality of target nodes, responding to a data query instruction, and determining a query type corresponding to the data to be queried indicated by the data query instruction; when the query type is a directory query type, determining a plurality of target nodes corresponding to a target directory based on an identification code of the target directory, wherein the target directory at least comprises a plurality of files, the target directory represents to index the plurality of files, and each file corresponds to one target node; and querying the data to be queried stored in each target node.
Further, the method for data processing of distributed metadata further comprises: after determining the query type corresponding to the data to be queried indicated by the data query instruction, determining target information corresponding to a target file based on position information of the target file when the query type is a file query type; determining a corresponding hash value based on the target information; and determining a target node based on the hash value, and inquiring the data to be inquired from the target node.
Further, the method for data processing of distributed metadata further comprises: after a data query instruction is responded, and a query type corresponding to the data to be queried indicated by the data query instruction is determined, when the number of the target files is multiple, target information corresponding to each target file is determined based on position information of the multiple target files; determining a plurality of hash values based on the target information corresponding to each target file; and determining a target node corresponding to each hash value based on the plurality of hash values, and inquiring the data to be inquired from each target node.
Further, the method for data processing of distributed metadata further comprises: after determining a plurality of target nodes based on the plurality of hash values, creating at least one replica node corresponding to each target node; and backing up file metadata information corresponding to the data to be processed based on the copy node.
Further, the method for data processing of distributed metadata further comprises: after the file metadata information corresponding to the data to be processed is backed up based on the copy node, responding to a data query instruction, and determining the target node based on the data query instruction; and when the target node is abnormal, inquiring data to be inquired from the replica node.
According to another aspect of the embodiments of the present invention, there is also provided an apparatus for data processing of distributed metadata, including: the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a plurality of data to be processed, the data to be processed at least comprises data information and information of a plurality of file metadata corresponding to the data information, the data information represents position information of the data to be processed, and the position information at least comprises identification codes of the data to be processed and file names of a plurality of files corresponding to the file metadata; the calculation module is used for carrying out Hash calculation on the data information of the data to be processed to obtain a plurality of Hash values; a determination module to determine a plurality of target nodes based on the plurality of hash values; and the distribution module is used for distributing the file metadata information corresponding to the data to be processed to a plurality of target nodes.
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to execute the above-mentioned method for data processing of distributed metadata when running.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including one or more processors; a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for executing a program, wherein the program is arranged to perform the above-described data processing of distributed metadata when executed.
According to another aspect of embodiments of the present invention, there is also provided a computer program product comprising a computer program/instructions which, when executed by a processor, implement the method of data processing of distributed metadata described above.
In the embodiment of the invention, a mode of carrying out hash calculation on identification codes of data to be processed and file names of a plurality of files corresponding to a plurality of file metadata is adopted, and file metadata information corresponding to the data to be processed is distributed to a plurality of target nodes, firstly, a plurality of data to be processed are obtained, wherein the data to be processed at least comprises data information and information of a plurality of file metadata corresponding to the data information, the data information represents position information of the data to be processed, and the position information at least comprises the identification codes of the data to be processed and the file names of the plurality of files corresponding to the plurality of file metadata; performing hash calculation on the data information of the data to be processed to obtain a plurality of hash values; determining a plurality of target nodes based on the plurality of hash values; and distributing the file metadata information corresponding to the data to be processed to a plurality of target nodes.
In the process, hash calculation is carried out on the identification codes of the data to be processed and the file names of the files corresponding to the file metadata, so that the file metadata of the data to be processed can be distributed on the target node in a balanced manner through hash values, and the effects of dynamic distribution and load balance of the file metadata are achieved; in addition, the file metadata are uniformly distributed on the target nodes, when the file metadata of a plurality of files are acquired, the data acquisition tasks can be executed on the target nodes at the same time, and the file metadata can be acquired at the same time, so that the data acquisition efficiency is improved.
Therefore, the scheme provided by the application achieves the purpose of distributing the file metadata information corresponding to the data to be processed to a plurality of target nodes by performing hash calculation on the identification code of the data to be processed and the file names of a plurality of files corresponding to the file metadata, thereby realizing the technical effects of dynamic distribution and load balance of the file metadata and further solving the technical problem of unbalanced dynamic distribution of the file metadata in the prior art.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a schematic diagram of an alternative method of data processing of distributed metadata according to an embodiment of the invention;
FIG. 2 is a schematic diagram of an alternative method of data processing of distributed metadata according to an embodiment of the invention;
FIG. 3 is a schematic diagram of an alternative method of data processing of distributed metadata according to an embodiment of the invention;
FIG. 4 is a schematic diagram of an alternative apparatus for data processing of distributed metadata according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an alternative electronic device according to an embodiment of the invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the related information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) related to the present invention are information and data authorized by the user or sufficiently authorized by each party. For example, an interface is provided between the system and the relevant user or institution, and before obtaining the relevant information, an obtaining request needs to be sent to the user or institution through the interface, and after receiving the consent information fed back by the user or institution, the relevant information needs to be obtained.
Example 1
In accordance with an embodiment of the present invention, there is provided an embodiment of a method of data processing of distributed metadata, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.
For convenience of description, some terms or expressions referred to in the embodiments of the present application are explained below:
distributed file storage system: the method comprises the steps of storing data on a plurality of physically dispersed storage nodes, carrying out unified management and distribution on resources of the nodes, and providing a file system access interface for a user.
File metadata: file metadata refers to system data used to describe characteristics of a file, such as access rights, file owner, and distribution information of file data blocks, among others. In a distributed file system, the distribution information includes the location of the file on the disk and the location of the disk in the cluster. A user needs to manipulate a file to first obtain its metadata to locate the file and obtain the content or related attributes of the file.
Fig. 1 is a schematic diagram of a method of data processing of distributed metadata according to an embodiment of the present invention, as shown in fig. 1, the method including the steps of:
step S101, a plurality of data to be processed are obtained, wherein the data to be processed at least comprises data information and information of a plurality of file metadata corresponding to the data information, the data information represents position information of the data to be processed, and the position information at least comprises identification codes of the data to be processed and file names of a plurality of files corresponding to the file metadata.
In step S101, a plurality of pieces of data to be processed may be acquired based on a system, a server, an electronic device, or the like, and in this implementation, the plurality of pieces of data to be processed may be acquired by the system, for example, by a distributed file storage system.
Optionally, the data to be processed may be a directory, where the directory includes a parent directory, multiple files and sub-directories of the parent directory, and corresponding file metadata information. Optionally, the location information of the data to be processed is location information stored in the data to be processed, and includes an identification code of the parent directory and a name of the file, such as/root/dir 1/file1, where the file1 is a file name corresponding to the file metadata.
Step S102, carrying out Hash calculation on the data information of the data to be processed to obtain a plurality of Hash values.
In step S102, in this embodiment, an organization manner of KV (Key-Value) Key values is adopted, an identification code of a file parent directory and a file name are used as KEYs, where the identification code is an ID of the file parent directory, and a file metadata inode (index node) of a file or a subdirectory is used as a Value), a plurality of servers form a metadata cluster, and when a file is operated, the system performs hash calculation according to the Key through a metadata cluster data processing module to obtain a plurality of hash values.
Step S103, a plurality of target nodes are determined based on the plurality of hash values.
And step S104, distributing the file metadata information corresponding to the data to be processed to a plurality of target nodes.
In steps S103-S104, the system determines a target node to be written based on a plurality of hash values, where the target node may be a metadata node, and then distributes file metadata storage corresponding to the data to be processed to the metadata node for storage, for example, if N metadata nodes are determined, then the balanced hashes of all files or directories are performed on the N metadata nodes.
Based on the schemes defined in the above steps S101 to S103, it can be known that, in the embodiment of the present invention, a manner is adopted in which hash calculation is performed on an identification code of data to be processed and file names of a plurality of files corresponding to the plurality of file metadata, and file metadata information corresponding to the data to be processed is distributed to a plurality of target nodes, first, by acquiring a plurality of data to be processed, where the data to be processed at least includes data information and information of a plurality of file metadata corresponding to the data information, the data information represents location information of the data to be processed, and the location information at least includes the identification code of the data to be processed and the file names of the plurality of files corresponding to the plurality of file metadata; performing hash calculation on the data information of the data to be processed to obtain a plurality of hash values; determining a plurality of target nodes based on the plurality of hash values; and distributing file metadata information corresponding to the data to be processed to a plurality of target nodes.
In the process, hash calculation is carried out on the identification codes of the data to be processed and the file names of the files corresponding to the file metadata, so that the file metadata of the data to be processed can be distributed on the target node in a balanced manner through hash values, and the effects of dynamic distribution and load balance of the file metadata are achieved; in addition, the file metadata are uniformly distributed on the target nodes, when the file metadata of a plurality of files are acquired, data acquisition tasks can be simultaneously executed on the target nodes, and the file metadata can be simultaneously acquired, so that the data acquisition efficiency is improved.
Therefore, the scheme provided by the application achieves the purpose of distributing the file metadata information corresponding to the data to be processed to a plurality of target nodes by performing hash calculation on the identification code of the data to be processed and the file names of a plurality of files corresponding to the file metadata, thereby realizing the technical effects of dynamic distribution and load balance of the file metadata and further solving the technical problem of unbalanced dynamic distribution of the file metadata in the prior art.
In an optional embodiment, after distributing file metadata information corresponding to the to-be-processed data to a plurality of target nodes, the system responds to a data query instruction and determines a query type corresponding to the to-be-queried data indicated by the data query instruction; then when the query type is a directory query type, determining a plurality of target nodes corresponding to a target directory based on an identification code of the target directory, wherein the target directory at least comprises a plurality of files, the target directory represents to index the plurality of files, and each file corresponds to one target node; and finally, inquiring the data to be inquired stored in each target node.
Optionally, when all files in the target directory are queried, the system determines the parent directory ID of the target directory in response to the data query instruction, simultaneously issues a data query task to each metadata node through the metadata processing module, and then reads file metadata information corresponding to all files and subdirectories in the directory. For example, as shown in fig. 2, the following files are stored in the target node of the system:
/root/dir1/subdir1/,/root/dir1/file1,/root/dir1/file2,/root/dir1/file2,/root/dir1/file3,/root/dir1/file4,/root/dir2/file1。
when all subdirectories and files under the directory/root/dir 1/in the system are inquired, a data inquiry task is issued to all metadata nodes through a metadata processing module, each metadata node is inquired, and a data inquiry result is returned, wherein the data inquiry result comprises all subdirectory and file information under the directory/root/dir 1/and comprises:
/root/dir1/subdir1/,/root/dir1/file1,/root/dir1/file2,/root/dir1/file2,/root/dir1/file3,/root/dir1/file4。
it should be noted that, a plurality of target nodes corresponding to the target directory are determined by the identification code of the target directory, and the data to be queried is obtained from each target node, so that the data query time is reduced, and the efficiency of file metadata query is improved.
Further, after determining a query type corresponding to the data to be queried indicated by the data query instruction, when the query type is a file query type, the system determines target information corresponding to the target file based on the position information of the target file; then determining a corresponding hash value based on the target information; and determining a target node based on the hash value, and inquiring the data to be inquired from the target node.
Optionally, when a certain file in the system is queried, a file path of the data to be queried is determined according to the data to be queried indicated by the data query instruction, then a parent directory ID and a file name corresponding to the file are obtained through the file path, a corresponding hash value is determined according to the name, the hash value is hashed to a corresponding metadata node, and metadata information of the file is obtained from the metadata node.
It should be noted that, the corresponding hash value is determined by the name based on the parent directory ID and the file name, and the file metadata information is obtained by hashing the corresponding metadata node, so that the accuracy of data acquisition is improved.
Further, after responding to a data query instruction and determining a query type corresponding to the data to be queried indicated by the data query instruction, the system determines target information corresponding to each target file based on position information of a plurality of target files when the number of the target files is multiple; determining a plurality of hash values based on the target information corresponding to each target file; and determining a target node corresponding to each hash value based on the plurality of hash values, and inquiring the data to be inquired from each target node.
Optionally, when a plurality of files in the system are queried, the metadata processing module issues a data query task to all nodes at the same time, a file path of each piece of data to be queried is determined based on the data to be queried indicated by the data query instruction, then a parent directory ID and a file name corresponding to the file are obtained through each file path, a corresponding hash value is determined through each name, the hash value is hashed to the corresponding metadata node, and finally metadata information of the corresponding file is obtained from the metadata node.
It should be noted that, by simultaneously acquiring file metadata information stored in a plurality of target nodes, data query time is reduced, and file metadata query efficiency is improved.
In another optional embodiment, after determining a plurality of target nodes based on the plurality of hash values, the system creates at least one replica node corresponding to each target node; and backing up file metadata information corresponding to the data to be processed based on the copy node.
Further, after backing up file metadata information corresponding to the data to be processed based on the copy node, the system determines the target node based on a data query instruction by responding to the data query instruction; and then when the target node is abnormal, inquiring data to be inquired from the replica node.
Optionally, in the prior art, if a node or a storage medium where a certain metadata is located fails, a part of files or directories may be damaged, and there are problems that the files cannot be acquired and the files are lost. For example, as shown in fig. 3, a plurality of target nodes exist in the system, and store/root/dir 1/file1,/root/dir 1/file2,/root/dir 2/file1, each target node corresponds to two copy nodes to form a metadata group, the three nodes store the same file/root/dir 1/file2, and when the system needs to acquire file metadata information such as/root/dir 1/file1,/root/dir 1/file2,/root/dir 2/file1, and when one of the nodes fails, the file metadata information can be acquired from the other copy nodes.
Optionally, when the system writes the file metadata information, each copy node is returned only after being written completely; when reading, only data is returned from any one copy node.
It should be noted that by creating at least one replica node corresponding to each target node, the problems that file metadata information cannot be acquired and is lost when a node fails are solved, and thus the efficiency of file metadata management is improved.
Therefore, the invention provides a novel data processing method of distributed metadata, which performs hash calculation on the identification code of data to be processed and the file names of a plurality of files corresponding to the plurality of file metadata, and distributes file metadata information corresponding to the data to be processed to a plurality of target nodes, thereby realizing balanced distribution of mass file metadata in a very simple and balanced manner; in addition, by means of concurrent execution of tasks by the N metadata nodes, N times (N is the number of the metadata nodes) performance is improved, and linear improvement on metadata management is achieved.
Example 2
According to an embodiment of the present invention, an embodiment of an apparatus for data processing of distributed metadata is provided, where fig. 4 is a schematic diagram of an alternative apparatus for data processing of distributed metadata according to an embodiment of the present invention, as shown in fig. 4, the apparatus includes:
an obtaining module 401, configured to obtain multiple pieces of data to be processed, where the data to be processed at least includes data information and information of multiple pieces of file metadata corresponding to the data information, where the data information represents location information of the data to be processed, and the location information at least includes an identifier of the data to be processed and file names of multiple files corresponding to the multiple pieces of file metadata; a calculating module 402, configured to perform hash calculation on data information of the to-be-processed data to obtain multiple hash values; a determining module 403, configured to determine a plurality of target nodes based on the plurality of hash values; a distributing module 404, configured to distribute file metadata information corresponding to the to-be-processed data to multiple target nodes.
It should be noted that the obtaining module 401, the calculating module 402, the determining module 403, and the distributing module 404 correspond to steps S101 to S104 in the foregoing embodiment, and the four modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 1.
Optionally, the apparatus for data processing of distributed metadata further includes: the system comprises a first response module, a first determination module and a first query module; the response module is used for responding to a data query instruction and determining a query type corresponding to the data to be queried indicated by the data query instruction; the first determining module is used for determining a plurality of target nodes corresponding to a target directory based on an identification code of the target directory when the query type is a directory query type, wherein the target directory at least comprises the plurality of files, the target directory represents to index the plurality of files, and each file corresponds to one target node; the first query module is used for querying the data to be queried stored in each target node.
Optionally, the apparatus for data processing of distributed metadata further includes: the system comprises a second determining module, a third determining module and a second inquiring module; the second determining module is used for determining target information corresponding to the target file based on the position information of the target file when the query type is the file query type; the third determination module is used for determining a corresponding hash value based on the target information; and the second query module is used for determining a target node based on the hash value and querying the data to be queried from the target node.
Optionally, the apparatus for data processing of distributed metadata further includes: the device comprises a fourth determining module, a fifth determining module and a third inquiring module; the fourth determining module is used for determining target information corresponding to each target file based on the position information of the plurality of target files when the number of the target files is multiple; the fifth determining module is used for determining a plurality of hash values based on the target information corresponding to each target file; and the third query module is used for determining a target node corresponding to each hash value based on the plurality of hash values and querying the data to be queried from each target node.
Optionally, the apparatus for data processing of distributed metadata further includes: a creation module and a backup module; the creating module is used for creating at least one copy node corresponding to each target node; and the backup module is used for backing up file metadata information corresponding to the data to be processed based on the copy node.
Optionally, the apparatus for data processing of distributed metadata further includes: a first response module and a fourth query module; the first response module is used for responding to a data query instruction and determining the target node based on the data query instruction; and the fourth query module is used for querying the data to be queried from the replica node when the target node is abnormal.
Example 3
According to another aspect of embodiments of the present invention, there is also provided a computer-readable storage medium having stored therein a computer program, wherein the computer program is arranged to perform the above-mentioned method of data processing of distributed metadata when executed.
Example 4
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, wherein fig. 5 is a schematic diagram of an alternative electronic device according to the embodiments of the present invention, as shown in fig. 5, the electronic device includes one or more processors; a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for executing a program, wherein the program is arranged to perform the above-described data processing of distributed metadata when executed.
Example 5
According to another aspect of embodiments of the present invention, there is also provided a computer program product comprising a computer program/instructions which, when executed by a processor, implement the method of data processing of distributed metadata described above.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, a division of a unit may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or may not be executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The above is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, a plurality of modifications and embellishments can be made without departing from the principle of the present invention, and these modifications and embellishments should also be regarded as the protection scope of the present invention.

Claims (10)

1. A method of data processing of distributed metadata, comprising:
acquiring a plurality of data to be processed, wherein the data to be processed at least comprises data information and information of a plurality of file metadata corresponding to the data information, the data information represents position information of the data to be processed, and the position information at least comprises an identification code of the data to be processed and file names of a plurality of files corresponding to the file metadata;
performing hash calculation on the data information of the data to be processed to obtain a plurality of hash values;
determining a plurality of target nodes based on the plurality of hash values;
and distributing file metadata information corresponding to the data to be processed to the target nodes.
2. The method according to claim 1, wherein after distributing file metadata information corresponding to the data to be processed to the plurality of target nodes, the method further comprises:
responding to a data query instruction, and determining a query type corresponding to the data to be queried indicated by the data query instruction;
when the query type is a directory query type, determining a plurality of target nodes corresponding to a target directory based on an identification code of the target directory, wherein the target directory at least comprises a plurality of files, the target directory represents to index the plurality of files, and each file corresponds to one target node;
and querying the data to be queried stored in each target node.
3. The method according to claim 2, wherein after determining the query type corresponding to the data to be queried indicated by the data query instruction, the method further comprises:
when the query type is a file query type, determining target information corresponding to a target file based on the position information of the target file;
determining a corresponding hash value based on the target information;
and determining a target node based on the hash value, and inquiring the data to be inquired from the target node.
4. The method of claim 3, wherein after determining, in response to a data query instruction, a query type corresponding to the data to be queried indicated by the data query instruction, the method further comprises:
when the number of the target files is multiple, determining target information corresponding to each target file based on the position information of the multiple target files;
determining a plurality of hash values based on the target information corresponding to each target file;
and determining a target node corresponding to each hash value based on the plurality of hash values, and inquiring the data to be inquired from each target node.
5. The method of claim 1, wherein after determining a plurality of target nodes based on the plurality of hash values, the method further comprises:
creating at least one replica node corresponding to each target node;
and backing up file metadata information corresponding to the data to be processed based on the copy node.
6. The method of claim 5, wherein after backing up file metadata information corresponding to the data to be processed based on the replica node, the method further comprises:
responding to a data query instruction, and determining the target node based on the data query instruction;
and when the target node is abnormal, inquiring data to be inquired from the replica node.
7. An apparatus for data processing of distributed metadata, comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a plurality of data to be processed, the data to be processed at least comprises data information and information of a plurality of file metadata corresponding to the data information, the data information represents position information of the data to be processed, and the position information at least comprises identification codes of the data to be processed and file names of a plurality of files corresponding to the file metadata;
the calculation module is used for carrying out Hash calculation on the data information of the data to be processed to obtain a plurality of Hash values;
a determination module to determine a plurality of target nodes based on the plurality of hash values;
and the distribution module is used for distributing the file metadata information corresponding to the data to be processed to the target nodes.
8. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is arranged to execute a method of data processing of distributed metadata as claimed in any one of claims 1 to 6 when executed.
9. An electronic device, wherein the electronic device comprises one or more processors; memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for running a program, wherein the program is arranged to, when run, perform the data processing of the distributed metadata as claimed in any one of claims 1 to 6.
10. A computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement a method of data processing of distributed metadata according to any of claims 1 to 6.
CN202211170926.6A 2022-09-23 2022-09-23 Method and device for data processing of distributed metadata and electronic equipment Pending CN115563073A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211170926.6A CN115563073A (en) 2022-09-23 2022-09-23 Method and device for data processing of distributed metadata and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211170926.6A CN115563073A (en) 2022-09-23 2022-09-23 Method and device for data processing of distributed metadata and electronic equipment

Publications (1)

Publication Number Publication Date
CN115563073A true CN115563073A (en) 2023-01-03

Family

ID=84743110

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211170926.6A Pending CN115563073A (en) 2022-09-23 2022-09-23 Method and device for data processing of distributed metadata and electronic equipment

Country Status (1)

Country Link
CN (1) CN115563073A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115952005A (en) * 2023-02-24 2023-04-11 浪潮电子信息产业股份有限公司 Metadata load balancing method, device, equipment and readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115952005A (en) * 2023-02-24 2023-04-11 浪潮电子信息产业股份有限公司 Metadata load balancing method, device, equipment and readable storage medium

Similar Documents

Publication Publication Date Title
US7020665B2 (en) File availability in distributed file storage systems
JP5798248B2 (en) System and method for implementing a scalable data storage service
US11176102B2 (en) Incremental virtual machine metadata extraction
CN108804253B (en) Parallel operation backup method for mass data backup
US20110022566A1 (en) File system
US20090012932A1 (en) Method and System For Data Storage And Management
JP2020525906A (en) Database tenant migration system and method
US20060074964A1 (en) Index processing
CN106484820B (en) Renaming method, access method and device
CN108196787B (en) Quota management method of cluster storage system and cluster storage system
US8832030B1 (en) Sharepoint granular level recoveries
CN105468720A (en) Method for integrating distributed data processing systems, corresponding systems and data processing method
US11809281B2 (en) Metadata management for scaled and high density backup environments
Chen et al. Bestpeer++: A peer-to-peer based large-scale data processing platform
Liu et al. Cfs: A distributed file system for large scale container platforms
Dev et al. Dr. Hadoop: an infinite scalable metadata management for Hadoop—How the baby elephant becomes immortal
US9177034B2 (en) Searchable data in an object storage system
Gao et al. An efficient ring-based metadata management policy for large-scale distributed file systems
CN115858488A (en) Parallel migration method and device based on data governance and readable medium
JP2021529379A (en) Search server centralized storage
CN115563073A (en) Method and device for data processing of distributed metadata and electronic equipment
Avilés-González et al. Scalable metadata management through OSD+ devices
CN115495432A (en) Method, device and equipment for supporting multiple instances
Liu et al. AngleCut: A ring-based hashing scheme for distributed metadata management
Dai et al. Managing rich metadata in high-performance computing systems using a graph model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination