CN103793475A - Distributed file system data migration method - Google Patents
Distributed file system data migration method Download PDFInfo
- Publication number
- CN103793475A CN103793475A CN201410005142.7A CN201410005142A CN103793475A CN 103793475 A CN103793475 A CN 103793475A CN 201410005142 A CN201410005142 A CN 201410005142A CN 103793475 A CN103793475 A CN 103793475A
- Authority
- CN
- China
- Prior art keywords
- data
- node
- index
- file
- source node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/185—Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a distributed file system data migration method. The method includes: during data migration, selecting a frequently modified or written-in file as a migration source file, directly writing modified or newly written-in data into a to-be-migrated destination node for a file being migrated, creating indexes for new data on the basis of original data, and remigrating unmodified data. Compared with a conventional cold data migration method, the method has the advantages that time for load balancing can be reduced greatly, a large amount of network io (input output) and disk io can be saved, and quick balancing of all node data loads is achieved.
Description
Technical field
The present invention relates to computer realm, be specifically related to a kind of method of distributed file system Data Migration.
Background technology
Distributed file system generally comprises client, meta data server and data server, and client is responsible for the access interface of file data and is formulated, and meta data server is processed layout and the base attribute of file, the data content of data server storage file.
Load between each data serving node, the capacity equilibrium usually performance to whole system and stability has a great impact, and online dilatation, also add the feature that new node is again a distributed file system indispensability, and the interpolation of new node must cause unbalanced in capacity and load of whole distributed file system the old and new's node, and Data Migration is the common method addressing this problem.
Traditional Data Migration, the source file of selecting is the file of asking without frequentation, to reaching interfering with each other of normally writing and move, but this method equilibrium is got up slow, and the file moving is modified or write operation meeting causes moving unsuccessfully, thereby the data of having moved before causing take the invalid network bandwidth and disk io.
Summary of the invention
For the deficiencies in the prior art, the object of this invention is to provide a kind of method of distributed file system Data Migration, the present invention proposes and selecting the file of often access is source file, can reach balanced fast, and can not cause the waste of the network bandwidth and disk io.
The object of the invention is to adopt following technical proposals to realize:
The invention provides a kind of method of distributed file system Data Migration, its improvements are, described method comprises: when distributed file system Data Migration, the distributed document that the source file of migration is chosen as frequent modification or writes, for the distributed document moving, by its modification or the data that newly write are directly write in destination node to be migrated, on legacy data, set up the index of new data, unmodified data are moved again;
Described method comprises the steps:
(1), by the modification to distributed document or write-access number of times statistics, determine that the high distributed document of the access frequency is the source file of migration;
(2) in the time having data to write source file, client is obtained after layout information to meta data server, sends to the source node (node at the placement position place of source file is source node, source file can corresponding multiple source nodes) of appointment;
(3) source node creates index node in the destination node of migration, then forwards the data on index node;
(4) index node completes after data write and returns to source node, and source node is revised index record;
(5) source node returns to client, completes and writes, and be equivalent to the migration work of this blocks of data;
(6) background controller moves the content not writing, and copies data from source node and writes destination node, and record the index record writing;
(7) when the content on source node all moves to after index node, notice meta data server revised file layout information, (native object is file data content corresponding on source node to delete native object, native object all can be replaced with to the file data content that source node is corresponding), so far distributed document Data Migration is complete.
Further, described modification or the data that newly write directly write to and in destination node to be migrated, comprise following manner:
Mode 1: when data arrive source node, be directly forwarded in destination node to be migrated by source node;
Mode 2: write and fashionablely directly write in destination node to be migrated by client, the then object of notification source node.
Further, the index of setting up new data on described legacy data comprises: the index relative of setting up source node and destination node by bitmap file bitmap, array or tree construction;
The relation that records source node and destination node by the corresponding 1bit of the 4K of minimum operation unit of client, often writes once, and bitmap file bitmap, array or the tree construction of correspondence skew place are set to 1.
Further, in described step (3), source node checks that the content that reads is whether on index node, if read the content on index node; If not, directly reading local content returns.
Compared with the prior art, the beneficial effect that the present invention reaches is:
The method of distributed file system Data Migration provided by the invention, it is source file that the method is selected the file of often access, can reach balanced fast, and can not cause the waste of the network bandwidth and disk io.When Data Migration, the source file file that is chosen as frequent modification or writes of migration, for the file moving, will directly write in destination node to be migrated its modification or the data that newly write, on legacy data, set up the index of new data, and unmodified data are moved again.
Accompanying drawing explanation
Fig. 1 is the process flow diagram that Data Migration provided by the invention writes.
Embodiment
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described in further detail.
The invention provides a kind of method of distributed file system Data Migration, described method comprises: when distributed file system Data Migration, the distributed document that the source file of migration is chosen as frequent modification or writes, for the distributed document moving, to its modification or the data that newly write be directly write in destination node to be migrated, on legacy data, set up the index of new data, unmodified data are moved again;
The process flow diagram that Data Migration writes as shown in Figure 1, comprises the steps:
(1), by the modification to distributed document or write-access number of times statistics, determine that the high distributed document of the access frequency is the source file of migration;
(2) in the time having data to write source file, client is obtained after layout information to meta data server, sends to the source node (what relation source file and source node are) of appointment;
(3) source node creates index node in the destination node of migration, then forwards the data on index node;
(4) index node completes after data write and returns to source node, and source node is revised index record;
(5) source node returns to client, completes and writes, and be equivalent to the migration work of this blocks of data;
(6) background controller moves the content not writing, and copies data from source node and writes destination node, and record the index record writing;
(7) when the content on source node all moves to after index node, notice meta data server revised file layout information, what (does native object refer to delete native object?), so far distributed document Data Migration is complete.
Revise or the data that newly write directly write to and in destination node to be migrated, comprise following manner:
Mode 1: when data arrive source node, be directly forwarded in destination node to be migrated by source node;
Mode 2: write and fashionablely directly write in destination node to be migrated by client, the then object of notification source node.
The index of setting up new data on legacy data comprises: the index relative of setting up source node and destination node by bitmap file bitmap, array or tree construction.
A. the recording method of data directory:
Can pass through the form record of bitmap file bitmap, record the relation of source object and object object by the corresponding 1bit of the 4K of minimum operation unit of client, often write once, the bitmap of correspondence skew place is set to 1.
B. how be set forth in data in transition process below is normally had access to by client:
<1>, in the time that client needs file reading, after meta data server obtains layout, sends to the source node of appointment
<2> source node checks that the content that reads whether in index object, if read the content in index object, if not, directly reads local content and returns.
The present invention can greatly reduce the time of load balancing with respect to traditional cold Data Migration, and saves a large amount of network io and disk io, reaches the fast uniform of each node data load.
Finally should be noted that: above embodiment is only in order to illustrate that technical scheme of the present invention is not intended to limit, although the present invention is had been described in detail with reference to above-described embodiment, those of ordinary skill in the field are to be understood that: still can modify or be equal to replacement the specific embodiment of the present invention, and do not depart from any modification of spirit and scope of the invention or be equal to replacement, it all should be encompassed in the middle of claim scope of the present invention.
Claims (4)
1. the method for a distributed file system Data Migration, it is characterized in that, described method comprises: when distributed file system Data Migration, the distributed document that the source file of migration is chosen as frequent modification or writes, for the distributed document moving, by its modification or the data that newly write are directly write in destination node to be migrated, on legacy data, set up the index of new data, unmodified data are moved again;
Described method comprises the steps:
(1), by the modification to distributed document or write-access number of times statistics, determine that the high distributed document of the access frequency is the source file of migration;
(2) in the time having data to write source file, client is obtained after layout information to meta data server, sends to the source node of appointment;
(3) source node creates index node in the destination node of migration, then forwards the data on index node;
(4) index node completes after data write and returns to source node, and source node is revised index record;
(5) source node returns to client, completes and writes, and be equivalent to the migration work of this blocks of data;
(6) background controller moves the content not writing, and copies data from source node and writes destination node, and record the index record writing;
(7) when the content on source node all moves to after index node, notice meta data server revised file layout information, deletes native object, and so far distributed document Data Migration is complete.
2. the method for claim 1, is characterized in that, described modification or the data that newly write directly write to and in destination node to be migrated, comprise following manner:
Mode 1: when data arrive source node, be directly forwarded in destination node to be migrated by source node;
Mode 2: write and fashionablely directly write in destination node to be migrated by client, the then object of notification source node.
3. the method for claim 1, is characterized in that, the index of setting up new data on described legacy data comprises: the index relative of setting up source node and destination node by bitmap file bitmap, array or tree construction;
The relation that records source node and destination node by the corresponding 1bit of the 4K of minimum operation unit of client, often writes once, and bitmap file bitmap, array or the tree construction of correspondence skew place are set to 1.
4. the method for claim 1, is characterized in that, in described step (3), source node checks that the content that reads is whether on index node, if read the content on index node; If not, directly reading local content returns.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410005142.7A CN103793475B (en) | 2014-01-06 | 2014-01-06 | A kind of method of Distributed File System Data migration |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410005142.7A CN103793475B (en) | 2014-01-06 | 2014-01-06 | A kind of method of Distributed File System Data migration |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103793475A true CN103793475A (en) | 2014-05-14 |
CN103793475B CN103793475B (en) | 2017-06-06 |
Family
ID=50669141
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410005142.7A Active CN103793475B (en) | 2014-01-06 | 2014-01-06 | A kind of method of Distributed File System Data migration |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103793475B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105279166A (en) * | 2014-06-20 | 2016-01-27 | 中国电信股份有限公司 | File management method and system |
CN106570093A (en) * | 2016-10-24 | 2017-04-19 | 南京中新赛克科技有限责任公司 | Independent metadata organization structure-based massive data migration method and apparatus |
WO2018036235A1 (en) * | 2016-08-22 | 2018-03-01 | 中兴通讯股份有限公司 | Solr data migration method and apparatus |
CN108848180A (en) * | 2018-06-27 | 2018-11-20 | 郑州云海信息技术有限公司 | A kind of metadata synchronization method, device, equipment and readable storage medium storing program for executing |
CN109388610A (en) * | 2018-08-30 | 2019-02-26 | 中国科学院计算技术研究所 | A kind of distributed meta data services migrating method and system of low latency |
CN109558457A (en) * | 2018-12-11 | 2019-04-02 | 浪潮(北京)电子信息产业有限公司 | A kind of method for writing data, device, equipment and storage medium |
US10659531B2 (en) | 2017-10-06 | 2020-05-19 | International Business Machines Corporation | Initiator aware data migration |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110307534A1 (en) * | 2009-03-25 | 2011-12-15 | Zte Corporation | Distributed file system supporting data block dispatching and file processing method thereof |
CN102567444A (en) * | 2011-10-25 | 2012-07-11 | 无锡城市云计算中心有限公司 | Method for optimizing distributed file system data access |
CN102841931A (en) * | 2012-08-03 | 2012-12-26 | 中兴通讯股份有限公司 | Storage method and storage device of distributive-type file system |
CN103067433A (en) * | 2011-10-24 | 2013-04-24 | 阿里巴巴集团控股有限公司 | Method, device and system of data migration of distributed type storage system |
-
2014
- 2014-01-06 CN CN201410005142.7A patent/CN103793475B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110307534A1 (en) * | 2009-03-25 | 2011-12-15 | Zte Corporation | Distributed file system supporting data block dispatching and file processing method thereof |
CN103067433A (en) * | 2011-10-24 | 2013-04-24 | 阿里巴巴集团控股有限公司 | Method, device and system of data migration of distributed type storage system |
CN102567444A (en) * | 2011-10-25 | 2012-07-11 | 无锡城市云计算中心有限公司 | Method for optimizing distributed file system data access |
CN102841931A (en) * | 2012-08-03 | 2012-12-26 | 中兴通讯股份有限公司 | Storage method and storage device of distributive-type file system |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105279166A (en) * | 2014-06-20 | 2016-01-27 | 中国电信股份有限公司 | File management method and system |
WO2018036235A1 (en) * | 2016-08-22 | 2018-03-01 | 中兴通讯股份有限公司 | Solr data migration method and apparatus |
CN106570093A (en) * | 2016-10-24 | 2017-04-19 | 南京中新赛克科技有限责任公司 | Independent metadata organization structure-based massive data migration method and apparatus |
CN106570093B (en) * | 2016-10-24 | 2020-03-27 | 南京中新赛克科技有限责任公司 | Mass data migration method and device based on independent metadata organization structure |
US10659531B2 (en) | 2017-10-06 | 2020-05-19 | International Business Machines Corporation | Initiator aware data migration |
CN108848180A (en) * | 2018-06-27 | 2018-11-20 | 郑州云海信息技术有限公司 | A kind of metadata synchronization method, device, equipment and readable storage medium storing program for executing |
CN109388610A (en) * | 2018-08-30 | 2019-02-26 | 中国科学院计算技术研究所 | A kind of distributed meta data services migrating method and system of low latency |
CN109558457A (en) * | 2018-12-11 | 2019-04-02 | 浪潮(北京)电子信息产业有限公司 | A kind of method for writing data, device, equipment and storage medium |
CN109558457B (en) * | 2018-12-11 | 2022-04-22 | 浪潮(北京)电子信息产业有限公司 | Data writing method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN103793475B (en) | 2017-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11662936B2 (en) | Writing data using references to previously stored data | |
CN103793475A (en) | Distributed file system data migration method | |
CN107526743B (en) | Method and apparatus for compressing file system metadata | |
US8768980B2 (en) | Process for optimizing file storage systems | |
KR102564170B1 (en) | Method and device for storing data object, and computer readable storage medium having a computer program using the same | |
CN105183839A (en) | Hadoop-based storage optimizing method for small file hierachical indexing | |
CN104866497A (en) | Metadata updating method and device based on column storage of distributed file system as well as host | |
CN104657500A (en) | Distributed storage method based on KEY-VALUE pair | |
CN105205082A (en) | Method and system for processing file storage in HDFS | |
CN103595797B (en) | Caching method for distributed storage system | |
TW201702860A (en) | Storage apparatus and method for autonomous space compaction | |
CN107729558A (en) | Method, system, device and the computer-readable storage medium that file system fragmentation arranges | |
US11042328B2 (en) | Storage apparatus and method for autonomous space compaction | |
CN107135662B (en) | Differential data backup method, storage system and differential data backup device | |
WO2021213281A1 (en) | Data reading method and system | |
CN105630923A (en) | Method for realizing archives administration informatization | |
CN103778219A (en) | HBase-based method for updating incremental indexes | |
CN104102552A (en) | Message processing method and device | |
CN105631010A (en) | Optimization method based on HDFS small file storage | |
CN104156327A (en) | Method for recognizing object power failure in write back mode in distributed file system | |
US20170286442A1 (en) | File system support for file-level ghosting | |
US11789622B2 (en) | Method, device and computer program product for storage management | |
WO2022121274A1 (en) | Metadata management method and apparatus in storage system, and storage system | |
CN104850548A (en) | Method and system used for implementing input/output process of big data platform | |
CN114115734A (en) | Data deduplication method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |