CN111427841A - Data management method and device, computer equipment and storage medium - Google Patents

Data management method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN111427841A
CN111427841A CN202010120131.9A CN202010120131A CN111427841A CN 111427841 A CN111427841 A CN 111427841A CN 202010120131 A CN202010120131 A CN 202010120131A CN 111427841 A CN111427841 A CN 111427841A
Authority
CN
China
Prior art keywords
data
file
read
block
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010120131.9A
Other languages
Chinese (zh)
Other versions
CN111427841B (en
Inventor
刘昌鑫
李立帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010120131.9A priority Critical patent/CN111427841B/en
Priority to PCT/CN2020/098793 priority patent/WO2021169113A1/en
Publication of CN111427841A publication Critical patent/CN111427841A/en
Application granted granted Critical
Publication of CN111427841B publication Critical patent/CN111427841B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/122File system administration, e.g. details of archiving or snapshots using management policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data management method, a data management device, computer equipment and a storage medium, wherein the method comprises the steps of storing a directory tree of a file system in a metadata server; partitioning file data of each file contained in the file system, and respectively storing the partitioned file data in a plurality of nodes of a cluster server; partitioning attribute information corresponding to the file system, and respectively storing the partitioned attribute information in a plurality of nodes of a cluster server; acquiring a data read-write instruction sent by a user; judging whether the directory tree contains specified directory information or not; if yes, extracting a specified file identifier corresponding to specified directory information from the metadata server; determining each designated node distributed in the cluster server by the file data corresponding to the file to be read and written according to the designated file identifier; and finishing the read-write operation with each appointed node according to the read-write operation information. The method and the device reduce the operating pressure of the metadata server and improve the data reading efficiency of the high-concurrency scene.

Description

Data management method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data management method, apparatus, computer device, and storage medium.
Background
A file system typically includes a large amount of file data distributed in directories or sub-directories of the file system. A directory contains a large number of subdirectories or file data, and each subdirectory contains a large number of file data. The metadata of the file system includes a directory tree and attribute information. The directory tree is used for recording the mapping relation between the file data logic and the physical position, and the attribute information is used for recording the data of the attribute information such as the file size, the modification time, the read-write permission and the like. In the related art, directory entry metadata, file data, and attribute information are generally managed centrally. In a high concurrency scene, due to mutual exclusion of data access, the concurrency of services is low, the time consumption of data reading is long, and the system efficiency is low.
Disclosure of Invention
The present application mainly aims to provide a data management method, an apparatus, a computer device and a storage medium, and aims to solve the problems of centralized management of file data, long time consumption for data reading and low system efficiency in the prior art.
The application provides a data management method, which comprises the following steps:
storing a directory tree of a file system in a metadata server; partitioning the file data of each file contained in the file system, and respectively storing each block in a plurality of nodes of a cluster server; partitioning the attribute information corresponding to the file system, and respectively storing each block in a plurality of nodes of the cluster server;
acquiring a data reading and writing instruction sent by a user, wherein the data reading and writing instruction carries specified directory information and reading and writing operation information of a file to be read and written;
judging whether the directory tree contains the specified directory information or not, wherein the directory tree is stored in a key-value mode, the directory information of each file is taken as a key, and a file identifier corresponding to each file is taken as a value;
if yes, extracting a specified file identifier corresponding to the specified directory information from the metadata server;
determining each designated node distributed in the cluster server by the file data corresponding to the file to be read and written according to the designated file identifier;
and completing the read-write operation with each appointed node according to the read-write operation information.
Further, the step of blocking file data of each file included in the file system and storing each block in a plurality of nodes of the cluster server includes:
dividing the file data of each file into a plurality of first blocks of data according to a first preset size;
determining each first node for storing each first block of data from all nodes of the cluster server according to a preset hash algorithm and respectively according to the number of the first blocks of data corresponding to each file, and establishing a first mapping relation between each first block of data and each first node;
and respectively storing the first block data corresponding to each file to each first node according to the first mapping relation.
Further, the read-write operation information includes a data read start value and a data read length value, and the step of completing the read-write operation with each of the designated nodes according to the read-write operation information includes:
extracting the data reading starting point value and the data reading length value from the read-write operation information;
determining a first target node corresponding to the data reading starting point value from each designated node;
in the first target node, reading the read data corresponding to the data read length value from the first block of data corresponding to the first target node with the data read start point value as a start point.
Further, the read-write operation information includes a data operation start point value and data write-in information, and the step of completing the read-write operation with each of the designated nodes according to the read-write operation information includes:
extracting the data operation starting point value and the data writing information from the read-write operation information;
determining a second target node corresponding to the data operation starting point value from a plurality of designated nodes;
in the second target node, with the data operation starting point value as a starting point, writing the data writing information into the first block data corresponding to the second target node to obtain update block data corresponding to the second target node;
and redistributing each node of the updated block data stored in the cluster server according to a preset hash algorithm.
Further, the step of determining, according to a preset hash algorithm, each first node for storing each first block of data from all nodes of the cluster server according to the number of the first block of data corresponding to each file, and establishing a first mapping relationship between each first block of data and each first node includes:
numbering all nodes in the cluster server in sequence according to Arabic numbers starting from 0 to respectively obtain the number of each node;
numbering all first block data corresponding to one file in sequence according to a preset numbering rule to obtain all file block numbers corresponding to the file;
according to the preset hash algorithm, respectively carrying out hash calculation on a file identifier corresponding to one file and a file block number corresponding to each first block of data corresponding to the file identifier, and respectively obtaining a hash value corresponding to each first block of data;
performing modular operation on the maximum node number according to the hash value corresponding to each first block of data to obtain a modular operation result;
and mapping each first block of data to each first node according to a preset mapping rule, wherein the preset mapping rule is that the modulo operation result of the first block of data is equal to the node number of the first node.
Further, the metadata server is a distributed metadata server cluster, and the step of storing the directory tree of the file system in the metadata server includes:
storing the directory tree of the file system in each metadata server in the metadata server cluster, wherein one metadata server is used as a main metadata server, and the other metadata servers are used as auxiliary metadata servers, wherein the main metadata server is used for providing metadata service for the outside;
judging whether the main metadata server fails or not;
and if the main metadata server fails, selecting one from the metadata servers as a new main metadata server, and adopting the new main metadata server to provide metadata service for the outside.
Further, after the step of determining whether the directory tree includes the specified directory information, the method includes:
if not, distributing a corresponding new file identifier for the specified directory information in the directory tree;
and distributing nodes for storing the file data corresponding to the new file identification for the new file identification through a preset hash algorithm.
The present application further provides a data management apparatus, including:
the data storage unit is used for storing the directory tree of the file system in the metadata server; partitioning the file data of each file contained in the file system, and respectively storing each block in a plurality of nodes of a cluster server; partitioning the attribute information corresponding to the file system, and respectively storing each block in a plurality of nodes of the cluster server;
the instruction acquisition unit is used for acquiring a data read-write instruction sent by a user, wherein the data read-write instruction carries specified directory information and read-write operation information of a file to be read and written;
the judging unit is used for judging whether the specified directory information is contained in the directory tree or not, wherein the directory tree is stored in a key-value mode, the directory information of each file is taken as a key, and the file identifier corresponding to each file is taken as a value;
the identification extracting unit is used for extracting the specified file identification corresponding to the specified directory information from the metadata server if the specified directory information is contained;
the node determining unit is used for determining each designated node distributed in the cluster server by the file data corresponding to the file to be read and written according to the designated file identifier;
and the read-write operation unit is used for finishing the read-write operation with each appointed node according to the read-write operation information.
The present application further proposes a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of any of the above-mentioned methods when executing the computer program.
The present application also proposes a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any of the above.
The beneficial effect of this application:
according to the data management method, the data management device, the computer equipment and the storage medium, the directory tree of the file system is stored in the metadata server; partitioning file data of each file contained in the file system, and respectively storing each block in a plurality of nodes of the cluster server; partitioning attribute information corresponding to the file system, and respectively storing each block in a plurality of nodes of the cluster server; when data reading and writing operations are carried out, only the appointed file identification corresponding to the file to be read and written is searched from the metadata server, the reading and writing operations can be carried out on the appointed node corresponding to the file to be read and written according to the appointed file identification, and the reading and writing operations are not required to be carried out on the metadata server, so that on one hand, the operating pressure of the metadata server is greatly reduced, and on the other hand, the data reading efficiency of a high concurrency scene is greatly improved.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating a data management method according to an embodiment of the present application;
FIG. 2 is a block diagram schematically illustrating a structure of a data management apparatus according to an embodiment of the present application;
fig. 3 is a block diagram schematically illustrating a structure of a computer device according to an embodiment of the present application.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Referring to fig. 1, an embodiment of the present application provides a data management method, including:
s1, storing the directory tree of the file system in a metadata server; partitioning the file data of each file contained in the file system, and respectively storing each block in a plurality of nodes of a cluster server; partitioning the attribute information corresponding to the file system, and respectively storing each block in a plurality of nodes of the cluster server;
s2, acquiring a data reading and writing instruction sent by a user, wherein the data reading and writing instruction carries specified directory information and reading and writing operation information of a file to be read and written;
s3, judging whether the directory tree contains the specified directory information or not, wherein the directory tree is stored in a key-value mode, the directory information of each file is taken as a key, and the file identifier corresponding to each file is taken as a value;
s4, if yes, extracting the appointed file identification corresponding to the appointed directory information from the metadata server;
s5, determining each designated node distributed in the cluster server by the file data corresponding to the file to be read and written according to the designated file identifier;
and S6, completing the read-write operation with each appointed node according to the read-write operation information.
In this embodiment, in step S1, the file data is data included in the file content in the file system, and the attribute information is data in which attribute information such as a file size, modification time, and read/write permission is recorded. The directory tree is stored in a dedicated metadata server, the file data is stored in a plurality of nodes of the cluster server in a distributed manner, and the attribute information is also stored in a plurality of nodes of the cluster server in a distributed manner.
The metadata server is a metadata server specially storing the directory tree. The metadata server can be a single server; or a plurality of metadata servers can be adopted for the distributed metadata server cluster, and a full peer-to-peer mode is adopted for the plurality of metadata servers, namely each metadata server is completely peer-to-peer, each metadata server can independently provide metadata service to the outside, and the data of each metadata server is kept synchronous. When the metadata server adopts a distributed metadata server cluster, one metadata server is adopted as a main metadata server to provide metadata service for the outside, and the data of the other metadata servers are kept to be updated synchronously but not provide the metadata service for the outside; when a main metadata server fails, selecting one from the other metadata servers as a new main metadata server; therefore, the reliability of the external service of the metadata server is ensured.
The cluster server comprises a plurality of nodes and is used for storing file data and attribute information in a distributed mode. In a cluster server, each node stores a portion of data, and a plurality of nodes collectively constitute complete data. The distributed cluster server respectively disperses the file data and the attribute information to a plurality of nodes. When a user needs to read and write data of a certain file, the data can be read and written at the corresponding node without reading and writing from the metadata server, and the pressure of the metadata server is effectively reduced.
In the above steps S2 to S4, in the metadata server, the directory tree is stored in a key-value form, the key-value storage can bring good scalability to the distributed storage management of the file system, and if the file data to be processed is increased, the mapping relationship of new file data is added to the key-value directory tree. Table 1 is a specific example of a directory tree stored in a key-value form in a file system having sub-folders dir1, dir2 and dir3 under folder FS1, wherein the sub-folder dir3 includes files 1 and 2.
TABLE 1 File System directory Tree
Key with a key body Value of
FS1 0001
0001/dir1 0002
0001/dir2 0003
0001/dir3 0004
0001/0004/file1 0005
0001/0004/file2 0006
When a user wants to open a file2 file and perform read-write operation on the file2 file, the specified directory information of the file to be read and written, namely < FS1/dir3/file2>, is carried in the data read-write instruction. For the specific example of table 1, the metadata server first finds that the value of FS1 is 0001, then finds a directory named "dir 3" under the root directory with the prefix "0001", that is, the value corresponding to the key of "0001/dir 3" is "0004", and further finds a file named "file 2" under the root directory with the prefix "0001/0004", that is, finds that the value corresponding to the key of "0001/0004/file 2" is "0006". When the metadata server finds the specified file identifier corresponding to the specified directory information, that is, it is determined that the specified directory information is included in the directory tree, the specified file identifier is extracted, and the specific example of table 1 is the value "0006". The logical location of the file data on the cluster server is determined according to the value "0006".
In the above steps S5-S6, each designated node of the file to be read and written distributed in the cluster server can be determined according to the designated file identifier of the file to be read and written. According to the appointed file identification, the specific node information of the file data distributed and stored in the cluster server corresponding to the appointed file identification can be determined. And then completing corresponding read-write operation in the specified node according to the read-write operation information in the data read-write instruction.
In the data management method of the embodiment, a directory tree of a file system is stored in a metadata server; partitioning file data of each file contained in the file system, and respectively storing each block in a plurality of nodes of the cluster server; partitioning attribute information corresponding to the file system, and respectively storing each block in a plurality of nodes of the cluster server; when data read-write operation is carried out, firstly, a data read-write instruction sent by a user is obtained, wherein the data read-write instruction carries specified directory information and read-write operation information of a file to be read and written; when the directory tree contains the specified directory information, extracting the specified file identification corresponding to the specified directory information from the metadata server; determining each designated node distributed in the cluster server by file data corresponding to the file to be read and written according to the designated file identifier; finally, completing the read-write operation between the designated nodes according to the read-write operation information; therefore, only the designated file identification corresponding to the file to be read and written is searched from the metadata server, the read-write operation can be carried out on the corresponding designated node according to the designated file identification, and the read-write operation is not required to be carried out on the metadata server, so that on one hand, the operating pressure of the metadata server is greatly reduced, and on the other hand, the data reading efficiency of a high-concurrency scene is greatly improved.
In an embodiment, the step S1 of partitioning the file data of each file included in the file system and storing each block in each of the plurality of nodes of the cluster server includes:
s101, dividing the file data of each file into a plurality of first blocks of data according to a first preset size;
s102, determining each first node for storing each first block of data from all nodes of the cluster server according to a preset hash algorithm and establishing a first mapping relation between each first block of data and each first node respectively according to the number of the first block of data corresponding to each file;
s103, respectively storing the first block data corresponding to each file to each first node according to the first mapping relation.
In this embodiment, the plurality of first block data are distributed to the plurality of nodes of the cluster server by a preset hash algorithm. The first node is selected from a cluster server, and refers to a single node storing a single first block of data. The first block data of each file data is distributed through a preset hash algorithm, each first block data is positioned to a corresponding first node, and a mapping relation between each first block data and each first node is established, so that the load of each node in the cluster server is kept uniform to the maximum extent, and the problems that the load of some nodes is too large and the load of some nodes is too small can be solved.
In an embodiment, the step S6 of completing the read-write operation with each designated node according to the read-write operation information includes:
s601, extracting the data reading start value and the data reading length value from the read-write operation information;
s602, determining a first target node corresponding to the data reading starting point value from each designated node;
s603, in the first target node, reading the read data corresponding to the data read length value from the first block of data corresponding to the first target node by taking the data read start point value as a start point.
In this embodiment, in step S601, when the user needs to read a certain data segment from the file data to be read and written, the corresponding data reading start value and data reading length value may be written in the read-write operation information. In a specific example, for example, a user wants to read a data segment with a length of 2M from a data point with an offset of 60M from a first character from a file to be read and written, the data reading start point value in the corresponding read and write operation information is < off ═ 60M >, and the data reading length value is < len ═ 2M >.
In step S602, a first target node where the data reading start point is located is determined according to the data reading start point value. For example, for the foregoing specific example, in step S5, each designated node of the file distribution to be read and written is determined according to the designated file identifier, and in this step S602, the node is used to determine which node of each designated node the data read start point value of < off ═ 60M > is specifically located, and the node is the above-mentioned first target node. Specifically, matching is performed in a first mapping relationship between each of the first block data and the first node according to the data reading start point value, for example, the file data to be read and written is sequentially divided into a plurality of first block data according to a size of 25M, if it is determined that the data reading start point value is located in the divided 3 rd first block data according to "off ═ 60M >, matching is performed from the first mapping relationship, and the first node corresponding to the first block data is obtained, that is, the first target node can be determined.
In step S603, in the first target node, read data corresponding to the data read length value is read from the data read start point value. The reading operation of the data is completed in the nodes of the cluster server, and the operation in the metadata server is not needed, so that the pressure of the metadata server is greatly reduced, and the high-concurrency scene service is facilitated.
In an embodiment, the step S6 of completing the read/write operation with each designated node according to the read/write operation information includes:
s611, extracting the data operation starting point value and the data writing information from the read-write operation information;
s612, determining a second target node corresponding to the data operation starting point value from the designated nodes;
s613, in the second target node, writing the data write information into the first block data corresponding to the second target node using the data operation start point value as a start point, to obtain update block data corresponding to the second target node;
and S614, redistributing each node of the updated block data stored in the cluster server according to a preset hash algorithm.
In this embodiment, in the step S611, when the user needs to write a certain data segment into the file data to be read and written, the corresponding data operation start point value and data write information may be written into the read and write operation information. In a specific example, for example, a user wants to write a data segment with a content of < xxxx > from a data point with an offset of 60M from a first character in a file to be read and written, a data read start point value in corresponding read and write operation information is < off ═ 60M >, and data write information is < xxxx >.
In step S612, the second target node where the data operation start point is located is determined according to the data operation start point value. For example, for the foregoing specific example, in step S5, each designated node of the file distribution to be read and written is determined according to the designated file identifier, and in this step S612, the node is used to determine which node of each designated node the data operation start point value of < off ═ 60M > is specifically located, and the node is the above-mentioned second target node. Specifically, according to the data operation starting point value, matching is performed in the first mapping relationship between each of the first block data and the first node, for example, the file data to be read and written is sequentially divided into a plurality of first block data according to the size of 25M, then according to < off ═ 60M >, it can be determined that the data operation starting point value is located in the divided 3 rd first block data, then matching is performed from the first mapping relationship, and the first node corresponding to the first block data is obtained, that is, the second target node can be determined.
In steps S613 to S614, the second target node writes corresponding data write information from the data operation start point value to obtain updated block data. And if the size of the updated block data exceeds the first preset size, the updated block data is stored in a plurality of nodes of the cluster server in a blocking mode through a preset hash algorithm again, so that the load of each node of the whole cluster server is kept balanced to the maximum extent. After step S614, the first mapping relationship is further updated.
The data writing operation of the embodiment is completed in the nodes of the cluster server, and the operation in the metadata server is not needed, so that the pressure of the metadata server is greatly reduced, and the high-concurrency scene service is facilitated.
In an embodiment, the step S102 of determining, according to the number of the first block data corresponding to each file, each first node for storing each first block data from all nodes of the cluster server according to a preset hash algorithm, and establishing a first mapping relationship between each first block data and each first node includes:
s1021, numbering all nodes in the cluster server in sequence according to Arabic numbers from 0 to obtain the number of each node;
s1022, numbering all the first block data corresponding to one file in sequence according to a preset numbering rule to obtain all file block numbers corresponding to one file;
s1023, respectively carrying out hash calculation on a file identifier corresponding to one file and a file block number corresponding to each first block of data corresponding to the file identifier according to the preset hash algorithm to respectively obtain a hash value corresponding to each first block of data;
s1024, performing modular operation on the maximum node number according to the hash value corresponding to each first block of data to obtain a modular operation result;
and S1025, respectively mapping each first block of data to each first node according to a preset mapping rule, wherein the preset mapping rule is that the modulo operation result of the first block of data is equal to the node number of the first node.
In this embodiment, each node in the cluster server is numbered, for example, from 0 to N-1, a hash value corresponding to each first block of data is obtained by performing hash calculation on a value corresponding to each file data in the directory tree key-value table and a file block number of each first block of data, the hash value is modulo N to obtain a remainder i, and a node numbered i in the cluster server is a first node corresponding to the first block of data. The hash algorithm is the prior art, and the algorithm is not described herein in detail.
In one embodiment, the metadata server is a distributed metadata server cluster, and the step S1 of storing the directory tree of the file system in the metadata server includes:
s121, storing the directory tree of the file system in each metadata server in the metadata server cluster, taking one metadata server as a main metadata server, and taking the other metadata servers as slave metadata servers, wherein the main metadata server is used for providing metadata service for the outside;
s122, judging whether the main metadata server fails or not;
and S123, if the main metadata server fails, selecting one from the metadata servers as a new main metadata server, and providing metadata service to the outside by adopting the new main metadata server.
In an embodiment, the step S1 of storing the attribute information corresponding to the file system in a distributed manner in the cluster server includes:
s111, dividing the attribute information into a plurality of second block data according to a second preset size;
s112, according to the number of the second block data, determining each second node for storing each second block data from all nodes of the cluster server through a preset hash algorithm, and establishing a second mapping relation between each second block data and each second node;
and S113, respectively storing the second block data to the second nodes according to the second mapping relation.
In this embodiment, the plurality of second block data are distributed to the plurality of nodes of the cluster server by a preset hash algorithm. The second node is selected from a cluster server, and refers to a single node storing a single second block of data. Second block data of the attribute information are distributed through a preset hash algorithm, each second block data is positioned to a corresponding second node, and a mapping relation between each second block data and each second node is established, so that the load of each node in the cluster server is kept uniform to the maximum extent, and the problems that the load of some nodes is too large and the load of some nodes is too small can be avoided. Specifically, the specific allocation calculation process of the preset hash algorithm includes numbering each node in the cluster server, for example, from 0 to N-1, performing hash calculation on the file block number of each second block data to obtain a hash value corresponding to each second block data, modulo N the hash value to obtain a remainder i, and determining the node numbered i in the cluster server as the second node corresponding to the second block data. The specific process of hash calculation is the prior art, and is not described herein in detail.
In an embodiment, after the step S6 of completing the read/write operation with each of the designated nodes according to the read/write operation information includes:
and S7, updating the attribute information corresponding to the file system based on the read-write operation.
In this embodiment, the attribute information is used to record data of attribute information such as file size, modification time, and read-write permission. When the file of the file system is read and written, the attribute information of the file system is also modified. And modifying the attribute information based on specific read-write operation. Specifically, according to the second mapping relationship, a specific node stored in the attribute information is determined, and the attribute information is modified at the corresponding node.
In an embodiment, after the step S3 of determining whether the directory tree includes the specified directory information, the method includes:
s401, if not, distributing a corresponding new file identifier for the specified directory information in the directory tree;
s501, distributing nodes for storing file data corresponding to the new file identification for the new file identification through a preset hash algorithm.
In this embodiment, in step S401, when the directory tree does not include the specified directory information, a new file identifier is allocated to the specified directory information, and the directory tree is updated. For example, if the specified directory information is < FS1/dir3/file3>, and the corresponding directory information cannot be found in the directory tree in table 1, a new file identifier, for example, "0007", is allocated to the directory information, and the directory tree is updated, and a new key is "0001/0004/file 3", and the corresponding value is "0007".
In the step S501, a node of the file data corresponding to the newly created file identifier is allocated by using a preset hash algorithm, so as to create a file corresponding to the specified directory information in the node. Through the above steps S401 to S501, the user can add new file data in the file system. Only the directory tree needs to be updated in the metadata server, and the operations of newly creating other files and writing data can be completed at the nodes of the cluster server, so that the pressure of the metadata server is greatly reduced, and the method is favorable for the implementation of high-concurrency scene services.
Referring to fig. 2, an embodiment of the present application provides a data management apparatus, including:
a data storage unit 10 for storing a directory tree of the file system in the metadata server; partitioning the file data of each file contained in the file system, and respectively storing each block in a plurality of nodes of a cluster server; partitioning the attribute information corresponding to the file system, and respectively storing each block in a plurality of nodes of the cluster server;
the instruction obtaining unit 20 is configured to obtain a data read-write instruction sent by a user, where the data read-write instruction carries specified directory information and read-write operation information of a file to be read and written;
a determining unit 30, configured to determine whether the directory tree includes the specified directory information, where the directory tree is stored in a key-value form, directory information of each file is used as a key, and a file identifier corresponding to each file is used as a value;
an identifier extracting unit 40, configured to extract, if the specified directory information is included, a specified file identifier corresponding to the specified directory information from the metadata server;
a node determining unit 50, configured to determine, according to the specified file identifier, each specified node where file data corresponding to the file to be read and written is distributed in the cluster server;
and a read-write operation unit 60, configured to complete read-write operation with each of the designated nodes according to the read-write operation information.
In this embodiment, the implementation processes of the functions and actions of the data storage unit 10, the instruction obtaining unit 20, the determining unit 30, the identifier extracting unit 40, the node determining unit 50, and the read-write operating unit 60 in the data management apparatus are specifically described in the implementation processes of steps S1-S6 in the data management method, and are not described herein again.
The data management device of the embodiment stores the directory tree of the file system in the metadata server; partitioning file data of each file contained in the file system, and respectively storing each block in a plurality of nodes of the cluster server; partitioning attribute information corresponding to the file system, and respectively storing each block in a plurality of nodes of the cluster server; when data read-write operation is carried out, firstly, a data read-write instruction sent by a user is obtained, wherein the data read-write instruction carries specified directory information and read-write operation information of a file to be read and written; when the directory tree contains the specified directory information, extracting the specified file identification corresponding to the specified directory information from the metadata server; determining each designated node distributed in the cluster server by file data corresponding to the file to be read and written according to the designated file identifier; finally, completing the read-write operation between the designated nodes according to the read-write operation information; therefore, only the designated file identification corresponding to the file to be read and written is searched from the metadata server, the read-write operation can be carried out on the corresponding designated node according to the designated file identification, and the read-write operation is not required to be carried out on the metadata server, so that on one hand, the operating pressure of the metadata server is greatly reduced, and on the other hand, the data reading efficiency of a high-concurrency scene is greatly improved.
In one embodiment, the data storage unit 10 includes:
the first dividing subunit is used for dividing the file data of each file into a plurality of first blocks of data according to a first preset size;
the first allocating subunit is configured to determine, according to the number of first block data corresponding to each file, each first node for storing each first block data according to a preset hash algorithm from all nodes of the cluster server, and establish a first mapping relationship between each first block data and each first node;
and the first storage subunit is configured to store the first block data corresponding to each file to each first node according to the first mapping relationship.
In this embodiment, the implementation processes of the functions and functions of the first partitioning subunit, and the first storage subunit in the data management apparatus are specifically described in the implementation processes of steps S101 to S103 in the data management method, and are not described herein again.
In one embodiment, the read/write operation unit 60 includes:
the first reading subunit is used for extracting the data reading starting point value and the data reading length value from the read-write operation information;
the first determining subunit is used for determining a first target node corresponding to the data reading starting point value from each designated node;
a first operation subunit, configured to, in the first target node, read, using the data read start point value as a start point, read, from a first block of data corresponding to the first target node, read data corresponding to the data read length value.
In this embodiment, the implementation processes of the functions and functions of the first reading subunit, the first determining subunit, and the first operating subunit in the data management apparatus are specifically described in the implementation processes of steps S601 to S603 in the data management method, and are not described herein again.
In one embodiment, the read/write operation unit 60 includes:
the second reading subunit is used for extracting the data operation starting point value and the data writing information from the reading and writing operation information;
the second determining subunit is used for determining a second target node corresponding to the data operation starting point value from the plurality of designated nodes;
a second operation subunit, configured to write, in the second target node, the data write information into the first block data corresponding to the second target node using the data operation start point value as a start point, so as to obtain update block data corresponding to the second target node;
and the redistribution subunit is used for redistributing each node of the updated block data stored in the cluster server according to a preset hash algorithm.
In this embodiment, the implementation processes of the functions and functions of the second reading subunit, the second determining subunit, the second operating subunit, and the redistribution subunit in the data management apparatus are specifically described in the implementation processes of steps S611 to S614 in the data management method, and are not described herein again.
In one embodiment, the first allocation subunit includes:
the first numbering module is used for numbering all the nodes in the cluster server in sequence according to Arabic numbers starting from 0 to respectively obtain the number of each node;
the second numbering module is used for numbering all the first block data corresponding to one file in sequence according to a preset numbering rule to obtain all file block numbers corresponding to the file;
the hash calculation module is used for respectively carrying out hash calculation on a file identifier corresponding to one file and a file block number corresponding to each first block of data corresponding to the file identifier according to the preset hash algorithm to respectively obtain a hash value corresponding to each first block of data;
the module taking operation module is used for respectively carrying out module taking operation on the maximum node number according to the hash value corresponding to each piece of the first block data to obtain a module taking operation result;
and the mapping module is used for mapping each first block of data to each first node according to a preset mapping rule, wherein the preset mapping rule is that the modulo operation result of the first block of data is equal to the node number of the first node.
In this embodiment, the implementation processes of the functions and actions of the first numbering module, the second numbering module, the hash calculation module, the modulo operation module, and the mapping module in the first allocating subunit are specifically described in the implementation processes of steps S1021 to S1025 in the data management method, and are not described herein again.
In an embodiment, the metadata server is a distributed metadata server cluster, and the data storage unit 10 includes:
the directory tree storage subunit is configured to store a directory tree of the file system in each metadata server in the metadata server cluster, and use one metadata server as a master metadata server and the other metadata servers as slave metadata servers, where the master metadata server is configured to provide a metadata service to the outside;
a judging subunit, configured to judge whether the primary metadata server fails;
and the reselection subunit is used for selecting one of the metadata servers as a new main metadata server if the main metadata server fails, and providing metadata service for the outside by adopting the new main metadata server.
In this embodiment, the implementation processes of the functions and functions of the directory tree storage subunit, the judgment subunit, and the reselection subunit in the data storage unit 10 are specifically described in the implementation processes of steps S121 to S123 in the data management method, and are not described herein again.
The data storage unit 10 includes:
the second dividing subunit is used for dividing the attribute information into a plurality of second block data according to a second preset size;
a second sub-distribution unit, configured to determine, according to the number of the second block data, each second node for storing each second block data from all nodes of the cluster server through a preset hash algorithm, and establish a second mapping relationship between each second block data and each second node;
and the second storage subunit is configured to store, according to the second mapping relationship, each piece of the second block data to each second node, respectively.
In this embodiment, the implementation processes of the functions and functions of the second partitioning subunit, and the second storage subunit in the data management apparatus are specifically described in the implementation processes of steps S111 to S113 in the data management method, and are not described herein again.
In one embodiment, the data management apparatus further includes:
and the attribute updating unit is used for updating the attribute information corresponding to the file system based on the read-write operation.
In this embodiment, the detailed implementation process of the function and the action of the attribute updating unit in the data management apparatus is described in the implementation process corresponding to step S7 in the data management method, and is not described herein again.
In one embodiment, the data management apparatus further includes:
the identification distribution unit is used for distributing a corresponding new file identification for the specified directory information in the directory tree if the specified directory information is not contained;
and the node distribution unit is used for distributing nodes for storing the file data corresponding to the new file identification for the new file identification through a preset hash algorithm.
In this embodiment, the implementation process of the functions and actions of the identifier allocating unit and the node allocating unit in the data management apparatus is specifically described in the implementation process corresponding to steps S401 to S501 in the data management method, and is not described herein again.
Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data such as directory trees and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data management method.
The processor executes the data management method, and includes:
storing a directory tree of a file system in a metadata server; partitioning the file data of each file contained in the file system, and respectively storing each block in a plurality of nodes of a cluster server; partitioning the attribute information corresponding to the file system, and respectively storing each block in a plurality of nodes of the cluster server;
acquiring a data reading and writing instruction sent by a user, wherein the data reading and writing instruction carries specified directory information and reading and writing operation information of a file to be read and written;
judging whether the directory tree contains the specified directory information or not, wherein the directory tree is stored in a key-value mode, the directory information of each file is taken as a key, and a file identifier corresponding to each file is taken as a value;
if yes, extracting a specified file identifier corresponding to the specified directory information from the metadata server;
determining each designated node distributed in the cluster server by the file data corresponding to the file to be read and written according to the designated file identifier;
and completing the read-write operation with each appointed node according to the read-write operation information.
In an embodiment, the step of the processor partitioning file data of each file included in the file system and storing each block in a plurality of nodes of the cluster server includes:
dividing the file data of each file into a plurality of first blocks of data according to a first preset size;
determining each first node for storing each first block of data from all nodes of the cluster server according to a preset hash algorithm and respectively according to the number of the first blocks of data corresponding to each file, and establishing a first mapping relation between each first block of data and each first node;
and respectively storing the first block data corresponding to each file to each first node according to the first mapping relation.
In an embodiment, the read-write operation information includes a data read start value and a data read length value, and the step of the processor completing the read-write operation with each of the designated nodes according to the read-write operation information includes:
extracting the data reading starting point value and the data reading length value from the read-write operation information;
determining a first target node corresponding to the data reading starting point value from each designated node;
in the first target node, reading the read data corresponding to the data read length value from the first block of data corresponding to the first target node with the data read start point value as a start point.
In an embodiment, the read-write operation information includes a data operation start point value and data write-in information, and the step of the processor completing the read-write operation with each of the designated nodes according to the read-write operation information includes:
extracting the data operation starting point value and the data writing information from the read-write operation information;
determining a second target node corresponding to the data operation starting point value from a plurality of designated nodes;
in the second target node, with the data operation starting point value as a starting point, writing the data writing information into the first block data corresponding to the second target node to obtain update block data corresponding to the second target node;
and redistributing each node of the updated block data stored in the cluster server according to a preset hash algorithm.
In an embodiment, the processor determines, according to a preset hash algorithm, each first node for storing each first block of data from all nodes of the cluster server according to the number of the first block of data corresponding to each file, and the step of establishing a first mapping relationship between each first block of data and each first node includes:
numbering all nodes in the cluster server in sequence according to Arabic numbers starting from 0 to respectively obtain the number of each node;
numbering all first block data corresponding to one file in sequence according to a preset numbering rule to obtain all file block numbers corresponding to the file;
according to the preset hash algorithm, respectively carrying out hash calculation on a file identifier corresponding to one file and a file block number corresponding to each first block of data corresponding to the file identifier, and respectively obtaining a hash value corresponding to each first block of data;
performing modular operation on the maximum node number according to the hash value corresponding to each first block of data to obtain a modular operation result;
and mapping each first block of data to each first node according to a preset mapping rule, wherein the preset mapping rule is that the modulo operation result of the first block of data is equal to the node number of the first node.
In an embodiment, the metadata server is a distributed metadata server cluster, and the step of the processor storing the directory tree of the file system in the metadata server includes:
storing the directory tree of the file system in each metadata server in the metadata server cluster, wherein one metadata server is used as a main metadata server, and the other metadata servers are used as auxiliary metadata servers, wherein the main metadata server is used for providing metadata service for the outside;
judging whether the main metadata server fails or not;
and if the main metadata server fails, selecting one from the metadata servers as a new main metadata server, and adopting the new main metadata server to provide metadata service for the outside.
In an embodiment, after the step of determining whether the directory tree includes the specified directory information, the processor includes:
if not, distributing a corresponding new file identifier for the specified directory information in the directory tree;
and distributing nodes for storing the file data corresponding to the new file identification for the new file identification through a preset hash algorithm.
Those skilled in the art will appreciate that the architecture shown in fig. 3 is only a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects may be applied.
An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a data management method, and specifically:
storing a directory tree of a file system in a metadata server; partitioning the file data of each file contained in the file system, and respectively storing each block in a plurality of nodes of a cluster server; partitioning the attribute information corresponding to the file system, and respectively storing each block in a plurality of nodes of the cluster server;
acquiring a data reading and writing instruction sent by a user, wherein the data reading and writing instruction carries specified directory information and reading and writing operation information of a file to be read and written;
judging whether the directory tree contains the specified directory information or not, wherein the directory tree is stored in a key-value mode, the directory information of each file is taken as a key, and a file identifier corresponding to each file is taken as a value;
if yes, extracting a specified file identifier corresponding to the specified directory information from the metadata server;
determining each designated node distributed in the cluster server by the file data corresponding to the file to be read and written according to the designated file identifier;
and completing the read-write operation with each appointed node according to the read-write operation information.
In an embodiment, the step of the processor partitioning file data of each file included in the file system and storing each block in a plurality of nodes of the cluster server includes:
dividing the file data of each file into a plurality of first blocks of data according to a first preset size;
determining each first node for storing each first block of data from all nodes of the cluster server according to a preset hash algorithm and respectively according to the number of the first blocks of data corresponding to each file, and establishing a first mapping relation between each first block of data and each first node;
and respectively storing the first block data corresponding to each file to each first node according to the first mapping relation.
In an embodiment, the read-write operation information includes a data read start value and a data read length value, and the step of the processor completing the read-write operation with each of the designated nodes according to the read-write operation information includes:
extracting the data reading starting point value and the data reading length value from the read-write operation information;
determining a first target node corresponding to the data reading starting point value from each designated node;
in the first target node, reading the read data corresponding to the data read length value from the first block of data corresponding to the first target node with the data read start point value as a start point.
In an embodiment, the read-write operation information includes a data operation start point value and data write-in information, and the step of the processor completing the read-write operation with each of the designated nodes according to the read-write operation information includes:
extracting the data operation starting point value and the data writing information from the read-write operation information;
determining a second target node corresponding to the data operation starting point value from a plurality of designated nodes;
in the second target node, with the data operation starting point value as a starting point, writing the data writing information into the first block data corresponding to the second target node to obtain update block data corresponding to the second target node;
and redistributing each node of the updated block data stored in the cluster server according to a preset hash algorithm.
In an embodiment, the processor determines, according to a preset hash algorithm, each first node for storing each first block of data from all nodes of the cluster server according to the number of the first block of data corresponding to each file, and the step of establishing a first mapping relationship between each first block of data and each first node includes:
numbering all nodes in the cluster server in sequence according to Arabic numbers starting from 0 to respectively obtain the number of each node;
numbering all first block data corresponding to one file in sequence according to a preset numbering rule to obtain all file block numbers corresponding to the file;
according to the preset hash algorithm, respectively carrying out hash calculation on a file identifier corresponding to one file and a file block number corresponding to each first block of data corresponding to the file identifier, and respectively obtaining a hash value corresponding to each first block of data;
performing modular operation on the maximum node number according to the hash value corresponding to each first block of data to obtain a modular operation result;
and mapping each first block of data to each first node according to a preset mapping rule, wherein the preset mapping rule is that the modulo operation result of the first block of data is equal to the node number of the first node.
In an embodiment, the metadata server is a distributed metadata server cluster, and the step of the processor storing the directory tree of the file system in the metadata server includes:
storing the directory tree of the file system in each metadata server in the metadata server cluster, wherein one metadata server is used as a main metadata server, and the other metadata servers are used as auxiliary metadata servers, wherein the main metadata server is used for providing metadata service for the outside;
judging whether the main metadata server fails or not;
and if the main metadata server fails, selecting one from the metadata servers as a new main metadata server, and adopting the new main metadata server to provide metadata service for the outside.
In an embodiment, after the step of determining whether the directory tree includes the specified directory information, the processor includes:
if not, distributing a corresponding new file identifier for the specified directory information in the directory tree;
and distributing nodes for storing the file data corresponding to the new file identification for the new file identification through a preset hash algorithm.
In summary, in the data management method, apparatus, computer device and storage medium of the present application, a directory tree of a file system is stored in a metadata server; partitioning file data of each file contained in the file system, and respectively storing each block in a plurality of nodes of the cluster server; partitioning attribute information corresponding to the file system, and respectively storing each block in a plurality of nodes of the cluster server; when data reading and writing operations are carried out, only the appointed file identification corresponding to the file to be read and written is searched from the metadata server, the reading and writing operations can be carried out on the appointed node corresponding to the file to be read and written according to the appointed file identification, and the reading and writing operations are not required to be carried out on the metadata server, so that on one hand, the operating pressure of the metadata server is greatly reduced, and on the other hand, the data reading efficiency of a high concurrency scene is greatly improved.
It will be understood by those of ordinary skill in the art that all or a portion of the processes of the methods of the embodiments described above may be implemented by hardware that is instructed to be associated with a computer program that may be stored on a non-volatile computer-readable storage medium that, when executed, may include the processes of the embodiments of the methods described above, wherein any reference to memory, storage, database or other medium provided herein and used in the embodiments may include non-volatile and/or volatile memory.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (10)

1. A method for managing data, comprising:
storing a directory tree of a file system in a metadata server; partitioning the file data of each file contained in the file system, and respectively storing each block in a plurality of nodes of a cluster server; partitioning the attribute information corresponding to the file system, and respectively storing each block in a plurality of nodes of the cluster server;
acquiring a data reading and writing instruction sent by a user, wherein the data reading and writing instruction carries specified directory information and reading and writing operation information of a file to be read and written;
judging whether the directory tree contains the specified directory information or not, wherein the directory tree is stored in a key-value mode, the directory information of each file is taken as a key, and a file identifier corresponding to each file is taken as a value;
if yes, extracting a specified file identifier corresponding to the specified directory information from the metadata server;
determining each designated node distributed in the cluster server by the file data corresponding to the file to be read and written according to the designated file identifier;
and completing the read-write operation with each appointed node according to the read-write operation information.
2. The data management method according to claim 1, wherein the step of blocking the file data of each file included in the file system and storing each block in each of a plurality of nodes of a cluster server comprises:
dividing the file data of each file into a plurality of first blocks of data according to a first preset size;
determining each first node for storing each first block of data from all nodes of the cluster server according to a preset hash algorithm and respectively according to the number of the first blocks of data corresponding to each file, and establishing a first mapping relation between each first block of data and each first node;
and respectively storing the first block data corresponding to each file to each first node according to the first mapping relation.
3. The data management method according to claim 2, wherein the read-write operation information includes a data read start value and a data read length value, and the step of completing the read-write operation with each of the designated nodes according to the read-write operation information includes:
extracting the data reading starting point value and the data reading length value from the read-write operation information;
determining a first target node corresponding to the data reading starting point value from each designated node;
in the first target node, reading the read data corresponding to the data read length value from the first block of data corresponding to the first target node with the data read start point value as a start point.
4. The data management method according to claim 2, wherein the read/write operation information includes a data operation start point value and data write information, and the step of completing the read/write operation with each of the designated nodes according to the read/write operation information includes:
extracting the data operation starting point value and the data writing information from the read-write operation information;
determining a second target node corresponding to the data operation starting point value from a plurality of designated nodes;
in the second target node, with the data operation starting point value as a starting point, writing the data writing information into the first block data corresponding to the second target node to obtain update block data corresponding to the second target node;
and redistributing each node of the updated block data stored in the cluster server according to a preset hash algorithm.
5. The data management method according to claim 2, wherein the step of determining, according to a preset hash algorithm, each first node for storing each first block of data from all nodes of the cluster server according to the number of the first block of data corresponding to each file, and establishing the first mapping relationship between each first block of data and each first node comprises:
numbering all nodes in the cluster server in sequence according to Arabic numbers starting from 0 to respectively obtain the number of each node;
numbering all first block data corresponding to one file in sequence according to a preset numbering rule to obtain all file block numbers corresponding to the file;
according to the preset hash algorithm, respectively carrying out hash calculation on a file identifier corresponding to one file and a file block number corresponding to each first block of data corresponding to the file identifier, and respectively obtaining a hash value corresponding to each first block of data;
performing modular operation on the maximum node number according to the hash value corresponding to each first block of data to obtain a modular operation result;
and mapping each first block of data to each first node according to a preset mapping rule, wherein the preset mapping rule is that the modulo operation result of the first block of data is equal to the node number of the first node.
6. The data management method of claim 1, wherein the metadata server is a distributed metadata server cluster, and the step of storing the directory tree of the file system in the metadata server comprises:
storing the directory tree of the file system in each metadata server in the metadata server cluster, wherein one metadata server is used as a main metadata server, and the other metadata servers are used as auxiliary metadata servers, wherein the main metadata server is used for providing metadata service for the outside;
judging whether the main metadata server fails or not;
and if the main metadata server fails, selecting one from the metadata servers as a new main metadata server, and adopting the new main metadata server to provide metadata service for the outside.
7. The data management method according to claim 1, wherein said step of determining whether or not said directory tree includes said specified directory information comprises:
if not, distributing a corresponding new file identifier for the specified directory information in the directory tree;
and distributing nodes for storing the file data corresponding to the new file identification for the new file identification through a preset hash algorithm.
8. A data management apparatus, comprising:
the data storage unit is used for storing the directory tree of the file system in the metadata server; partitioning the file data of each file contained in the file system, and respectively storing each block in a plurality of nodes of a cluster server; partitioning the attribute information corresponding to the file system, and respectively storing each block in a plurality of nodes of the cluster server;
the instruction acquisition unit is used for acquiring a data read-write instruction sent by a user, wherein the data read-write instruction carries specified directory information and read-write operation information of a file to be read and written;
the judging unit is used for judging whether the specified directory information is contained in the directory tree or not, wherein the directory tree is stored in a key-value mode, the directory information of each file is taken as a key, and the file identifier corresponding to each file is taken as a value;
the identification extracting unit is used for extracting the specified file identification corresponding to the specified directory information from the metadata server if the specified directory information is contained;
the node determining unit is used for determining each designated node distributed in the cluster server by the file data corresponding to the file to be read and written according to the designated file identifier;
and the read-write operation unit is used for finishing the read-write operation with each appointed node according to the read-write operation information.
9. A computer device comprising a memory and a processor, the memory having stored therein a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202010120131.9A 2020-02-26 2020-02-26 Data management method, device, computer equipment and storage medium Active CN111427841B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010120131.9A CN111427841B (en) 2020-02-26 2020-02-26 Data management method, device, computer equipment and storage medium
PCT/CN2020/098793 WO2021169113A1 (en) 2020-02-26 2020-06-29 Data management method and apparatus, and computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010120131.9A CN111427841B (en) 2020-02-26 2020-02-26 Data management method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111427841A true CN111427841A (en) 2020-07-17
CN111427841B CN111427841B (en) 2024-08-27

Family

ID=71547266

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010120131.9A Active CN111427841B (en) 2020-02-26 2020-02-26 Data management method, device, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN111427841B (en)
WO (1) WO2021169113A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111831618A (en) * 2020-07-21 2020-10-27 北京青云科技股份有限公司 Data writing method, data reading method, device, equipment and storage medium
CN112000618A (en) * 2020-08-07 2020-11-27 北京浪潮数据技术有限公司 File change management method, device, equipment and storage medium for cluster nodes
CN112947864A (en) * 2021-03-29 2021-06-11 南方电网数字电网研究院有限公司 Metadata storage method, device, equipment and storage medium
CN113076298A (en) * 2021-04-15 2021-07-06 上海卓钢链科技有限公司 Distributed small file storage system
CN113392068A (en) * 2021-06-28 2021-09-14 上海商汤科技开发有限公司 Data processing method, device and system
CN114328421A (en) * 2022-03-17 2022-04-12 联想凌拓科技有限公司 Metadata service architecture management method, computer system, electronic device and medium
CN115827261A (en) * 2023-01-10 2023-03-21 北京燧原智能科技有限公司 Data synchronization method, device, server and medium based on distributed network
CN116185965A (en) * 2023-05-04 2023-05-30 联想凌拓科技有限公司 Method, apparatus, device and medium for quality of service control
CN116431583A (en) * 2023-04-18 2023-07-14 中国长江三峡集团有限公司 Electronic file processing method and device, electronic equipment and readable storage medium
CN116795636A (en) * 2023-06-21 2023-09-22 广州市玄武无线科技股份有限公司 Service system data monitoring method and device, electronic equipment and storage medium
CN118643019A (en) * 2024-08-14 2024-09-13 南京云创大数据科技股份有限公司 Metadata management method and system suitable for distributed file system

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114519033A (en) * 2022-02-21 2022-05-20 深圳市和讯华谷信息技术有限公司 Data writing method and related equipment thereof
CN115292247B (en) * 2022-09-28 2022-12-06 北京鼎轩科技有限责任公司 File reading method and device, electronic equipment and storage medium
CN116760850B (en) * 2023-08-17 2024-01-12 浪潮电子信息产业股份有限公司 Data processing method, device, equipment, medium and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855284A (en) * 2012-08-03 2013-01-02 北京联创信安科技有限公司 Method and system for managing data of cluster storage system
CN103002027A (en) * 2012-11-26 2013-03-27 中国科学院高能物理研究所 System and method for data storage on basis of key-value pair system tree-shaped directory achieving structure
US20160063021A1 (en) * 2014-08-28 2016-03-03 Futurewei Technologies, Inc. Metadata Index Search in a File System
CN106874383A (en) * 2017-01-10 2017-06-20 清华大学 A kind of decoupling location mode of metadata of distributed type file system
CN107480310A (en) * 2017-09-29 2017-12-15 郑州云海信息技术有限公司 A kind of metadata cluster catalogue dynamic load balancing method of release and system
CN107562757A (en) * 2016-07-01 2018-01-09 阿里巴巴集团控股有限公司 Inquiry, access method based on distributed file system, apparatus and system
US10140304B1 (en) * 2015-12-10 2018-11-27 EMC IP Holding Company LLC Distributed metadata servers in a file system with separate metadata servers for file metadata and directory metadata

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101692239B (en) * 2009-10-19 2012-10-03 浙江大学 Method for distributing metadata of distributed type file system
CN104660643A (en) * 2013-11-25 2015-05-27 南京中兴新软件有限责任公司 Request response method and device and distributed file system
CN105718484A (en) * 2014-12-04 2016-06-29 中兴通讯股份有限公司 File writing method, file reading method, file deletion method, file query method and client
CN108769137A (en) * 2018-05-08 2018-11-06 北京初志科技有限公司 Distributed structure/architecture data storing and reading method and device based on multigroup framework

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855284A (en) * 2012-08-03 2013-01-02 北京联创信安科技有限公司 Method and system for managing data of cluster storage system
CN103002027A (en) * 2012-11-26 2013-03-27 中国科学院高能物理研究所 System and method for data storage on basis of key-value pair system tree-shaped directory achieving structure
US20160063021A1 (en) * 2014-08-28 2016-03-03 Futurewei Technologies, Inc. Metadata Index Search in a File System
US10140304B1 (en) * 2015-12-10 2018-11-27 EMC IP Holding Company LLC Distributed metadata servers in a file system with separate metadata servers for file metadata and directory metadata
CN107562757A (en) * 2016-07-01 2018-01-09 阿里巴巴集团控股有限公司 Inquiry, access method based on distributed file system, apparatus and system
CN106874383A (en) * 2017-01-10 2017-06-20 清华大学 A kind of decoupling location mode of metadata of distributed type file system
CN107480310A (en) * 2017-09-29 2017-12-15 郑州云海信息技术有限公司 A kind of metadata cluster catalogue dynamic load balancing method of release and system

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111831618A (en) * 2020-07-21 2020-10-27 北京青云科技股份有限公司 Data writing method, data reading method, device, equipment and storage medium
CN112000618A (en) * 2020-08-07 2020-11-27 北京浪潮数据技术有限公司 File change management method, device, equipment and storage medium for cluster nodes
CN112000618B (en) * 2020-08-07 2022-06-07 北京浪潮数据技术有限公司 File change management method, device, equipment and storage medium for cluster nodes
CN112947864B (en) * 2021-03-29 2024-03-08 南方电网数字平台科技(广东)有限公司 Metadata storage method, apparatus, device and storage medium
CN112947864A (en) * 2021-03-29 2021-06-11 南方电网数字电网研究院有限公司 Metadata storage method, device, equipment and storage medium
CN113076298A (en) * 2021-04-15 2021-07-06 上海卓钢链科技有限公司 Distributed small file storage system
CN113076298B (en) * 2021-04-15 2024-10-01 上海卓钢链科技有限公司 Distributed small file storage system
CN113392068A (en) * 2021-06-28 2021-09-14 上海商汤科技开发有限公司 Data processing method, device and system
CN114328421A (en) * 2022-03-17 2022-04-12 联想凌拓科技有限公司 Metadata service architecture management method, computer system, electronic device and medium
CN115827261A (en) * 2023-01-10 2023-03-21 北京燧原智能科技有限公司 Data synchronization method, device, server and medium based on distributed network
CN116431583A (en) * 2023-04-18 2023-07-14 中国长江三峡集团有限公司 Electronic file processing method and device, electronic equipment and readable storage medium
CN116185965A (en) * 2023-05-04 2023-05-30 联想凌拓科技有限公司 Method, apparatus, device and medium for quality of service control
CN116185965B (en) * 2023-05-04 2023-08-04 联想凌拓科技有限公司 Method, apparatus, device and medium for quality of service control
CN116795636B (en) * 2023-06-21 2024-02-13 广州市玄武无线科技股份有限公司 Service system data monitoring method and device, electronic equipment and storage medium
CN116795636A (en) * 2023-06-21 2023-09-22 广州市玄武无线科技股份有限公司 Service system data monitoring method and device, electronic equipment and storage medium
CN118643019A (en) * 2024-08-14 2024-09-13 南京云创大数据科技股份有限公司 Metadata management method and system suitable for distributed file system

Also Published As

Publication number Publication date
WO2021169113A1 (en) 2021-09-02
CN111427841B (en) 2024-08-27

Similar Documents

Publication Publication Date Title
CN111427841B (en) Data management method, device, computer equipment and storage medium
US11797498B2 (en) Systems and methods of database tenant migration
CN103518364B (en) The data-updating method of distributed memory system and server
US11169978B2 (en) Distributed pipeline optimization for data preparation
US10374792B1 (en) Layout-independent cryptographic stamp of a distributed dataset
US10853242B2 (en) Deduplication and garbage collection across logical databases
US11157445B2 (en) Indexing implementing method and system in file storage
US11461304B2 (en) Signature-based cache optimization for data preparation
US20130218934A1 (en) Method for directory entries split and merge in distributed file system
US20170277556A1 (en) Distribution system, computer, and arrangement method for virtual machine
JP2016529633A (en) Snapshot and clone replication
CN109299190B (en) Method and device for processing metadata of object in distributed storage system
CN111917834A (en) Data synchronization method and device, storage medium and computer equipment
US11675743B2 (en) Web-scale distributed deduplication
EP3362808B1 (en) Cache optimization for data preparation
CN114610680A (en) Method, device and equipment for managing metadata of distributed file system and storage medium
US20220300488A1 (en) Migration of a data blockchain
US10628391B1 (en) Method and system for reducing metadata overhead in a two-tier storage architecture
CN114968095A (en) Distributed hard disk management method, system, electronic device and readable storage medium
Klein et al. Dxram: A persistent in-memory storage for billions of small objects
CN112181899A (en) Metadata processing method and device and computer readable storage medium
US11288447B2 (en) Step editor for data preparation
CN111884940A (en) Interest matching method and device, computer equipment and storage medium
Büchler Indexing Genomic Data on Hadoop

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40033506

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant