CN102360382B - High-speed object-based parallel storage system directory replication method - Google Patents
High-speed object-based parallel storage system directory replication method Download PDFInfo
- Publication number
- CN102360382B CN102360382B CN 201110310368 CN201110310368A CN102360382B CN 102360382 B CN102360382 B CN 102360382B CN 201110310368 CN201110310368 CN 201110310368 CN 201110310368 A CN201110310368 A CN 201110310368A CN 102360382 B CN102360382 B CN 102360382B
- Authority
- CN
- China
- Prior art keywords
- file
- catalogue
- directory
- copy
- character string
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Abstract
The invention discloses a high-speed object-based parallel storage system directory replication method, which aims to solve the problems that the directory replication of an object-based parallel storage system is carried out on a client knot, time cost is large due to a serialization process, the client knot becomes a communication bottleneck because a data replication process needs to be transferred through the client knot in a data replication process, and the like. The technical scheme is as follows: a directory replication system is constructed, a directory replication task is divided into a plurality of replication tasks of a plurality of files, and then the replication tasks of the files are distributed to a plurality of object storage servers to be carried out in parallel, wherein the files are directly replicated to a destination object storage server from a source object storage server without being transferred through the client knot. Due to the adoption of the invention, the replication time can be greatly reduced; transferring through the client knot is unnecessary, so that the phenomenon that the client knot becomes the communication bottleneck is avoided; and a source file and a target file can not be located on the same object storage server in a replication process, so that single point failure is avoided.
Description
Technical field
The present invention relates to the performance optimization method of object parallel storage system, especially high speed catalogue clone method.
Background technology
Parallel memory system is the important component part in the massively parallel computer system, and the object parallel storage system is a kind of parallel storage structure.As shown in Figure 1, the object parallel storage system is made of by network interconnection a meta data server (being called for short MDS), a plurality of object storage server (being called for short OST) and a plurality of client's node (being called for short CN).Meta data server and a plurality of object storage server provide the file access services such as reading and writing data, data storage for client's node.Meta data server is that object storage system is peculiar for client's node provides Metadata Service.The owner of each file of metadata description, creation-time, modification time, file size and the file information such as distribution on object storage system.The object storage server is responsible for preserving the concrete data of file, and provides the file access service to client's node under the management of meta data server.Meta data server is at first submitted in the file access request of client's node, obtains file distribution information, then submits request of access according to distributed intelligence to relevant object storage server, finishes the reading and writing data process.Object storage system takes full advantage of the concurrency of high bandwidth network and memory access, for Parallel application provides higher data access bandwidth, has been widely used in the computer system of World super computing machine rank front ten.It is one of basic function of providing for client's node of parallel memory system that catalogue copies, and client's node copies to the another one position with the All Files under file directory and the catalogue from a position of parallel memory system.
Present common object-based parallel storage system directory replication method is centralized approach, the reproduction process that client's connection point manager (CP manager) is all.The user is at the standard directories copy command cp of client's node submit operation system, and client's node reads file under the catalogue to local internal memory one by one from the source object storage server, then file is write the destination object storage server.Client's node is all passed through in all reproduction processes, and client's node becomes performance bottleneck.There is not other open source literature to relate to the catalogue clone method of object parallel memory file system.
The memory capacity of current object parallel storage system reaches more than hundreds of T, the quantity of file reach millions of more than, simultaneously online file system client reaches thousands of.The time of convergence directory clone method and communication overhead are very large, and novel high speed catalogue clone method is the technical matters that those skilled in the art very pay close attention to.
Summary of the invention
The technical problem to be solved in the present invention is: copy concentrating on client's node and implement for the catalogue of object parallel storage system, the serialization process causes time overhead large, the data Replica process need cause client's node to become the problems such as communication performance bottleneck through the transfer of client's node, proposes the high speed catalogue clone method of parallelization.In order to solve the problems of the technologies described above, technical scheme of the present invention is: make up the catalogue dubbing system, the catalogue replication task is decomposed into the replication task of a plurality of files, then the file copy task is distributed to a plurality of object storage server executed in parallel, file directly copies to the Purpose object storage server from the source object storage server, need not the transfer of client's node.
Concrete technical scheme is:
The first step makes up the catalogue dubbing system, and the catalogue dubbing system is the catalogue propagation software module that is deployed on the object storage system.The catalogue dubbing system is comprised of catalogue replication task administration module, load distribution module, file copy execution module, wherein catalogue replication task administration module and load distribution module are deployed on client's node, and the file copy execution module is deployed on all object storage servers (being called for short OST).Catalogue replication task administration module is responsible for receiving the catalogue copy command, and recursion resolution catalogue copy command is the general catalogue copy command, creates target directory, assigns the general catalogue copy command to the load distribution module.The load distribution module is responsible for receiving the general catalogue copy command, according to general catalogue copy command spanned file copy command, the object storage server at retrieval source file place assigns the file copy order to the file copy execution module on the object storage server at source file place.The file copy execution module receives the file copy order, determines the object storage server at file destination place, the execute file replicate run.The basic document operational order that the catalogue copy command provides for operating system all copies to target directory with the All Files under the source directory and sub-directory, and command format is cp srcdirA destdirB.A file directory comprises file and sub-directory, also can include file and subordinate's sub-directory under the sub-directory.The general catalogue copy command is the peculiar file manipulation command that the present invention defines, and only the All Files under the source directory A is copied to target directory B, the sub-directory under the copy source directory A not, and command format is localcp srcdirA destdirB.SrcdirA and destdirB are absolute path.Absolute path refers to access in the file system fullpath of catalogue or file.
Second step, catalogue replication task administration module is deployed in client's node, receives the catalogue copy command cp srcdirA destdirB that the user submits to.Setting input parameter is srcdirA and destdirB, calls catalogue reproduction process DirCopy and carries out catalogue and copy.
Catalogue reproduction process DirCopy is a recursive call process.Input parameter is SRCDIR and DESTDIR, and wherein SRCDIR is source directory, and DESTDIR is target directory.The flow process of catalogue reproduction process DirCopy is as follows:
2.1 if target directory does not exist, then create target directory DESTDIR.
2.2, assign general catalogue copy command localcp SRCDIR DESTDIR to the load distribution module.
2.3, read all subdirectory name under the source directory SRCDIR, each subdirectory name is a character string, deposits in the dirlist formation.Each unit is the character string of a sub-directory name in the dirlist formation.
2.4 if the dirlist formation is empty, then stop the recursive call of DirCopy, turned for 2.6 steps.
2.5, begin to read one by one each character string the formation from the dirlist queue heads, each character string is done following processing:
With this character string assignment to workdir.
The absolute path character string Asrcdir of structure source directory, the absolute path character string is the merging of SRCDIR, catalogue space character "/" and three character strings of workdir, i.e. Asrcdir=SRCDIR+ "/"+workdir.
The absolute path character string Adestdir of structure target directory, the absolute path character string is the merging of DESTDIR, catalogue space character "/" and three character strings of workdir, i.e. Adestdir=DESTDIR+ "/"+workdir.
Be whether the target directory of Adestdir exists to meta data server MDS retrieval absolute path.If target directory Adestdir does not exist, then create target directory Adestdir to the MDS application.
Recursive call catalogue reproduction process DirCopy, input parameter is Asrcdir, Adestdir.
2.6, catalogue reproduction process DirCopy finishes.
In the 3rd step, the load distribution module that is deployed on client's node receives the general catalogue copy command localcp SRCDIR DESTDIR that catalogue replication task administration module is assigned.The general catalogue copy command is decomposed into a plurality of file copy orders, and the file copy execution module on the object storage server at source file place assigns the file copy order, execution in step is as follows:
3.1, read the All Files name under the source directory SRCDIR, each filename deposits in the filelist formation as a character string.Each unit is the character string of a file name in the filelist formation.
3.2 if the filelist formation is empty, then turned for the 4th step.
3.3, begin to read one by one each character string the formation from the filelist queue heads, each character string is done following processing:
With this character string assignment to workfile.
The absolute path character string Asrcfile of structure source file, absolute path is the merging of SRCDIR, catalogue space character "/" and three character strings of workfile, i.e. Asrcfile=SRCDIR+ "/"+workfile.
The absolute path character string Adestfile of structure file destination, absolute path is the merging of DESTDIR, catalogue space character "/" and three character strings of workfile, i.e. Adestfile=DESTDIR+ "/"+workfile.
Retrieving absolute path to MDS is the object storage server at the source file place of Asrcfile, is designated as srcOST.
File copy execution module on srcOST is assigned file copy order filecp Asrcfile Adestfile.
In the 4th step, the file copy execution module receives the file copy order filecp Asrcfile Adestfile that the load distribution module is assigned, a plurality of file copy execution module executed in parallel file copy operations.File copy operation execution in step is as follows:
4.1, create file destination Adestfile to MDS application, read the object storage server at file destination place according to file attribute, be designated as destOST.In the general object parallel memory file system, the request to create of a plurality of files will by MDS in turn memory allocated to different OST, so repeatedly the repeating to create also and can be assigned on the different OST of identical file.
4.2 if destOST is the OST at file copy execution module place, then delete Adestfile, turned for 4.1 steps.
4.3, the execute file replicate run, file Asrcfile is copied to destOST from local OST.
Compared with prior art, adopt the present invention can reach following technique effect:
1. the present invention was distributed to concurrent execution on a plurality of object storage servers with the catalogue replication task in the 3rd step, significantly reduced doubling time;
2. the present invention directly copied to the Purpose object storage server from the source object storage server with file in the 4th step, need not to have avoided client's node to become communication performance bottleneck by the transfer of client's node;
3. the present invention the 4th guarantees that source file and file destination not at same object storage server, avoided single point failure the step in reproduction process.
Description of drawings
Fig. 1 is the structural drawing of object parallel storage system.
Fig. 2 is the structural drawing of catalogue dubbing system.
Fig. 3 is overview flow chart of the present invention.
Embodiment
Fig. 1 is the structural drawing of object parallel storage system.The object parallel storage system is made of by network interconnection a meta data server (being called for short MDS), a plurality of object storage server (being called for short OST) and a plurality of client's node (being called for short CN).Meta data server and a plurality of object storage server provide the file access services such as reading and writing data, data storage for client's node.
Fig. 2 is the structural drawing of catalogue dubbing system.The catalogue dubbing system is comprised of catalogue replication task administration module, load distribution module, file copy execution module, wherein catalogue replication task administration module and load distribution module are deployed on client's node, and the file copy execution module is deployed on all object storage servers.Catalogue replication task administration module is resolved the catalogue copy command, assigns the general catalogue copy command to the load distribution module.The load distribution module is resolved the general catalogue copy command, assigns the file copy order to the file copy execution module.The replicate run of file copy execution module execute file.
Fig. 3 is overview flow chart of the present invention.
The first step makes up the catalogue dubbing system.The catalogue dubbing system is the catalogue propagation software that is deployed on the object storage system, is comprised of catalogue replication task administration module, load distribution module, file copy execution module.Catalogue replication task administration module is responsible for receiving the catalogue copy command, and recursion resolution catalogue copy command is the general catalogue copy command, creates target directory, assigns the general catalogue copy command to the load distribution module.The load distribution module is responsible for receiving the general catalogue copy command, and according to general catalogue copy command spanned file copy command, the object storage server at retrieval source file place assigns the file copy order to the file copy execution module.The file copy execution module receives the file copy order, determines the object storage server at file destination place, the execute file replicate run.
Second step, catalogue replication task administration module receives the catalogue copy command, creates target directory, assigns the general catalogue copy command to the load distribution module.
The 3rd step, copy distribution module and receive the general catalogue copy command, assign the file copy order to the file copy execution module.
In the 4th step, the file copy execution module receives the file copy order, the operation of executed in parallel file copy.
Claims (1)
1. high-speed object-based parallel storage system directory replication method is characterized in that may further comprise the steps:
The first step makes up the catalogue dubbing system, and the catalogue dubbing system is the catalogue propagation software module that is deployed on the object storage system, is comprised of catalogue replication task administration module, load distribution module, file copy execution module; Catalogue replication task administration module and load distribution module are deployed on client's node, and it is on the OST that the file copy execution module is deployed in all object storage servers; Catalogue replication task administration module is responsible for receiving the catalogue copy command, and recursion resolution catalogue copy command is the general catalogue copy command, creates target directory, assigns the general catalogue copy command to the load distribution module; The load distribution module is responsible for receiving the general catalogue copy command, according to general catalogue copy command spanned file copy command, the object storage server at retrieval source file place assigns the file copy order to the file copy execution module on the object storage server at source file place; The file copy execution module receives the file copy order, determines the object storage server at file destination place, the execute file replicate run; The basic document operational order that the catalogue copy command provides for operating system all copies to target directory with the All Files under the source directory and sub-directory, and command format is cp srcdirA destdirB; A file directory comprises file and sub-directory, also can include file and subordinate's sub-directory under the sub-directory; The general catalogue copy command is file manipulation command, only the All Files under the source directory A is copied to target directory B, the sub-directory under the copy source directory A not, and command format is localcp srcdirA destdirB; SrcdirA and destdirB are absolute path, and absolute path refers to access in the file system fullpath of catalogue or file;
Second step, catalogue replication task administration module is deployed in client's node, receives the catalogue copy command cp srcdirA destdirB that the user submits to, and setting input parameter is srcdirA and destdirB, calls catalogue reproduction process DirCopy and carries out catalogue and copy;
Catalogue reproduction process DirCopy is a recursive call process, and input parameter is SRCDIR and DESTDIR, and wherein SRCDIR is source directory, and DESTDIR is target directory, and the flow process of catalogue reproduction process DirCopy is as follows:
2.1 if target directory does not exist, then create target directory DESTDIR;
2.2, assign general catalogue copy command localcp SRCDIR DESTDIR to the load distribution module;
2.3, read all subdirectory name under the source directory SRCDIR, each subdirectory name is a character string, deposits in the dirlist formation, each unit is the character string of a sub-directory name in the dirlist formation;
2.4 if the dirlist formation is empty, then stop the recursive call of DirCopy, turned for 2.6 steps;
2.5, begin to read one by one each character string the formation from the dirlist queue heads, each character string is done following processing:
With this character string assignment to workdir;
The absolute path character string Asrcdir of structure source directory, the absolute path character string is the merging of SRCDIR, catalogue space character "/" and three character strings of workdir, i.e. Asrcdir=SRCDIR+ "/"+workdir;
The absolute path character string Adestdir of structure target directory, the absolute path character string is the merging of DESTDIR, catalogue space character "/" and three character strings of workdir, i.e. Adestdir=DESTDIR+ "/"+workdir;
Be whether the target directory of Adestdir exists to meta data server MDS retrieval absolute path, if target directory Adestdir does not exist, then create target directory Adestdir to the MDS application;
Recursive call catalogue reproduction process DirCopy, input parameter is Asrcdir, Adestdir;
2.6, catalogue reproduction process DirCopy finishes;
The 3rd step, the load distribution module that is deployed on client's node receives the general catalogue copy command localcp SRCDIR DESTDIR that catalogue replication task administration module is assigned, the general catalogue copy command is decomposed into a plurality of file copy orders, and the file copy execution module on the object storage server at source file place assigns the file copy order, and execution in step is as follows:
3.1, read the All Files name under the source directory SRCDIR, each filename deposits in the filelist formation as a character string, each unit is the character string of a file name in the filelist formation;
3.2 if the filelist formation is empty, then turned for the 4th step;
3.3, begin to read one by one each character string the formation from the filelist queue heads, each character string is done following processing:
With this character string assignment to workfile;
The absolute path character string Asrcfile of structure source file, absolute path is the merging of SRCDIR, catalogue space character "/" and three character strings of workfile, i.e. Asrcfile=SRCDIR+ "/"+workfile;
The absolute path character string Adestfile of structure file destination, absolute path is the merging of DESTDIR, catalogue space character "/" and three character strings of workfile, i.e. Adestfile=DESTDIR+ "/"+workfile;
Retrieving absolute path to MDS is the object storage server at the source file place of Asrcfile, is designated as srcOST;
File copy execution module on srcOST is assigned file copy order filecp Asrcfile Adestfile;
In the 4th step, the file copy execution module receives the file copy order filecp Asrcfile Adestfile that the load distribution module is assigned, a plurality of file copy execution module executed in parallel file copy operations, and file copy operation execution in step is as follows:
4.1, create file destination Adestfile to MDS application, read the object storage server at file destination place according to file attribute, be designated as destOST;
4.2 if destOST is the OST at file copy execution module place, then delete Adestfile, turned for 4.1 steps;
4.3, the execute file replicate run, file Asrcfile is copied to destOST from local OST.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110310368 CN102360382B (en) | 2011-10-13 | 2011-10-13 | High-speed object-based parallel storage system directory replication method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110310368 CN102360382B (en) | 2011-10-13 | 2011-10-13 | High-speed object-based parallel storage system directory replication method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102360382A CN102360382A (en) | 2012-02-22 |
CN102360382B true CN102360382B (en) | 2013-04-10 |
Family
ID=45585711
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201110310368 Expired - Fee Related CN102360382B (en) | 2011-10-13 | 2011-10-13 | High-speed object-based parallel storage system directory replication method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102360382B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103257901A (en) * | 2013-05-29 | 2013-08-21 | 北京奇虎科技有限公司 | Distribution method of computing tasks, cloud computing platform, terminal and system |
CN107896248B (en) * | 2017-11-13 | 2019-11-22 | 中山大学 | A kind of parallel file system application method based on client communication |
CN109240852A (en) * | 2018-08-27 | 2019-01-18 | 郑州云海信息技术有限公司 | A kind of method and apparatus of data copy |
CN111339037B (en) * | 2020-02-14 | 2023-06-09 | 西安奥卡云数据科技有限公司 | Efficient parallel replication method for parallel distributed file system |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100367727C (en) * | 2005-07-26 | 2008-02-06 | 华中科技大学 | Expandable storage system and control method based on objects |
CN101997823B (en) * | 2009-08-17 | 2013-10-02 | 联想(北京)有限公司 | Distributed file system and data access method thereof |
CN102164161B (en) * | 2011-01-10 | 2013-12-04 | 清华大学 | Method and device for performing file layout extraction on parallel file system |
-
2011
- 2011-10-13 CN CN 201110310368 patent/CN102360382B/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
CN102360382A (en) | 2012-02-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9495433B2 (en) | Data transfer optimization | |
AU2004267742B2 (en) | Automatic and dynamic provisioning of databases | |
US8543596B1 (en) | Assigning blocks of a file of a distributed file system to processing units of a parallel database management system | |
CN101576915B (en) | Distributed B+ tree index system and building method | |
CN104462185B (en) | A kind of digital library's cloud storage system based on mixed structure | |
CN105243155A (en) | Big data extracting and exchanging system | |
CN103294786A (en) | Metadata organization and management method and system of distributed file system | |
WO2019109854A1 (en) | Data processing method and device for distributed database, storage medium, and electronic device | |
CN102360382B (en) | High-speed object-based parallel storage system directory replication method | |
CN112214453B (en) | Large-scale industrial data compression storage method, system and medium | |
US7069270B1 (en) | Automated method and mechanism for converting a single instance application to a multiple instance application | |
US11741144B2 (en) | Direct storage loading for adding data to a database | |
CN108536833A (en) | A kind of distributed, database and its construction method towards big data | |
CN103365740A (en) | Data cold standby method and device | |
CN109388610A (en) | A kind of distributed meta data services migrating method and system of low latency | |
CN115809070A (en) | Method for mixed application of object storage in private cloud and big data cluster | |
Saxena et al. | A cloud-native architecture for replicated data services | |
CN1317662C (en) | Distribution type file access method | |
US10657105B2 (en) | Method and computer system for sharing objects | |
CN112965939A (en) | File merging method, device and equipment | |
Jayakar et al. | Efficient way for handling small files using extended HDFS | |
CN112905535A (en) | HBASE-based distributed object storage method | |
Cavage et al. | Bringing arbitrary compute to authoritative data | |
Arora | Data management: state-of-the-practice at open-science data centers | |
Becla et al. | Data Management Database Design |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20130410 Termination date: 20151013 |
|
EXPY | Termination of patent right or utility model |