CN103793475A - Distributed file system data migration method - Google Patents

Distributed file system data migration method Download PDF

Info

Publication number
CN103793475A
CN103793475A CN201410005142.7A CN201410005142A CN103793475A CN 103793475 A CN103793475 A CN 103793475A CN 201410005142 A CN201410005142 A CN 201410005142A CN 103793475 A CN103793475 A CN 103793475A
Authority
CN
China
Prior art keywords
data
node
index
file
source node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410005142.7A
Other languages
Chinese (zh)
Other versions
CN103793475B (en
Inventor
郭照斌
季旻
姜国梁
马振杰
杨鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WUXI CITY CLOUD COMPUTER CENTER CO Ltd
Original Assignee
WUXI CITY CLOUD COMPUTER CENTER CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WUXI CITY CLOUD COMPUTER CENTER CO Ltd filed Critical WUXI CITY CLOUD COMPUTER CENTER CO Ltd
Priority to CN201410005142.7A priority Critical patent/CN103793475B/en
Publication of CN103793475A publication Critical patent/CN103793475A/en
Application granted granted Critical
Publication of CN103793475B publication Critical patent/CN103793475B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/185Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a distributed file system data migration method. The method includes: during data migration, selecting a frequently modified or written-in file as a migration source file, directly writing modified or newly written-in data into a to-be-migrated destination node for a file being migrated, creating indexes for new data on the basis of original data, and remigrating unmodified data. Compared with a conventional cold data migration method, the method has the advantages that time for load balancing can be reduced greatly, a large amount of network io (input output) and disk io can be saved, and quick balancing of all node data loads is achieved.

Description

A kind of method of distributed file system Data Migration
Technical field
The present invention relates to computer realm, be specifically related to a kind of method of distributed file system Data Migration.
Background technology
Distributed file system generally comprises client, meta data server and data server, and client is responsible for the access interface of file data and is formulated, and meta data server is processed layout and the base attribute of file, the data content of data server storage file.
Load between each data serving node, the capacity equilibrium usually performance to whole system and stability has a great impact, and online dilatation, also add the feature that new node is again a distributed file system indispensability, and the interpolation of new node must cause unbalanced in capacity and load of whole distributed file system the old and new's node, and Data Migration is the common method addressing this problem.
Traditional Data Migration, the source file of selecting is the file of asking without frequentation, to reaching interfering with each other of normally writing and move, but this method equilibrium is got up slow, and the file moving is modified or write operation meeting causes moving unsuccessfully, thereby the data of having moved before causing take the invalid network bandwidth and disk io.
Summary of the invention
For the deficiencies in the prior art, the object of this invention is to provide a kind of method of distributed file system Data Migration, the present invention proposes and selecting the file of often access is source file, can reach balanced fast, and can not cause the waste of the network bandwidth and disk io.
The object of the invention is to adopt following technical proposals to realize:
The invention provides a kind of method of distributed file system Data Migration, its improvements are, described method comprises: when distributed file system Data Migration, the distributed document that the source file of migration is chosen as frequent modification or writes, for the distributed document moving, by its modification or the data that newly write are directly write in destination node to be migrated, on legacy data, set up the index of new data, unmodified data are moved again;
Described method comprises the steps:
(1), by the modification to distributed document or write-access number of times statistics, determine that the high distributed document of the access frequency is the source file of migration;
(2) in the time having data to write source file, client is obtained after layout information to meta data server, sends to the source node (node at the placement position place of source file is source node, source file can corresponding multiple source nodes) of appointment;
(3) source node creates index node in the destination node of migration, then forwards the data on index node;
(4) index node completes after data write and returns to source node, and source node is revised index record;
(5) source node returns to client, completes and writes, and be equivalent to the migration work of this blocks of data;
(6) background controller moves the content not writing, and copies data from source node and writes destination node, and record the index record writing;
(7) when the content on source node all moves to after index node, notice meta data server revised file layout information, (native object is file data content corresponding on source node to delete native object, native object all can be replaced with to the file data content that source node is corresponding), so far distributed document Data Migration is complete.
Further, described modification or the data that newly write directly write to and in destination node to be migrated, comprise following manner:
Mode 1: when data arrive source node, be directly forwarded in destination node to be migrated by source node;
Mode 2: write and fashionablely directly write in destination node to be migrated by client, the then object of notification source node.
Further, the index of setting up new data on described legacy data comprises: the index relative of setting up source node and destination node by bitmap file bitmap, array or tree construction;
The relation that records source node and destination node by the corresponding 1bit of the 4K of minimum operation unit of client, often writes once, and bitmap file bitmap, array or the tree construction of correspondence skew place are set to 1.
Further, in described step (3), source node checks that the content that reads is whether on index node, if read the content on index node; If not, directly reading local content returns.
Compared with the prior art, the beneficial effect that the present invention reaches is:
The method of distributed file system Data Migration provided by the invention, it is source file that the method is selected the file of often access, can reach balanced fast, and can not cause the waste of the network bandwidth and disk io.When Data Migration, the source file file that is chosen as frequent modification or writes of migration, for the file moving, will directly write in destination node to be migrated its modification or the data that newly write, on legacy data, set up the index of new data, and unmodified data are moved again.
Accompanying drawing explanation
Fig. 1 is the process flow diagram that Data Migration provided by the invention writes.
Embodiment
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described in further detail.
The invention provides a kind of method of distributed file system Data Migration, described method comprises: when distributed file system Data Migration, the distributed document that the source file of migration is chosen as frequent modification or writes, for the distributed document moving, to its modification or the data that newly write be directly write in destination node to be migrated, on legacy data, set up the index of new data, unmodified data are moved again;
The process flow diagram that Data Migration writes as shown in Figure 1, comprises the steps:
(1), by the modification to distributed document or write-access number of times statistics, determine that the high distributed document of the access frequency is the source file of migration;
(2) in the time having data to write source file, client is obtained after layout information to meta data server, sends to the source node (what relation source file and source node are) of appointment;
(3) source node creates index node in the destination node of migration, then forwards the data on index node;
(4) index node completes after data write and returns to source node, and source node is revised index record;
(5) source node returns to client, completes and writes, and be equivalent to the migration work of this blocks of data;
(6) background controller moves the content not writing, and copies data from source node and writes destination node, and record the index record writing;
(7) when the content on source node all moves to after index node, notice meta data server revised file layout information, what (does native object refer to delete native object?), so far distributed document Data Migration is complete.
Revise or the data that newly write directly write to and in destination node to be migrated, comprise following manner:
Mode 1: when data arrive source node, be directly forwarded in destination node to be migrated by source node;
Mode 2: write and fashionablely directly write in destination node to be migrated by client, the then object of notification source node.
The index of setting up new data on legacy data comprises: the index relative of setting up source node and destination node by bitmap file bitmap, array or tree construction.
A. the recording method of data directory:
Can pass through the form record of bitmap file bitmap, record the relation of source object and object object by the corresponding 1bit of the 4K of minimum operation unit of client, often write once, the bitmap of correspondence skew place is set to 1.
B. how be set forth in data in transition process below is normally had access to by client:
<1>, in the time that client needs file reading, after meta data server obtains layout, sends to the source node of appointment
<2> source node checks that the content that reads whether in index object, if read the content in index object, if not, directly reads local content and returns.
The present invention can greatly reduce the time of load balancing with respect to traditional cold Data Migration, and saves a large amount of network io and disk io, reaches the fast uniform of each node data load.
Finally should be noted that: above embodiment is only in order to illustrate that technical scheme of the present invention is not intended to limit, although the present invention is had been described in detail with reference to above-described embodiment, those of ordinary skill in the field are to be understood that: still can modify or be equal to replacement the specific embodiment of the present invention, and do not depart from any modification of spirit and scope of the invention or be equal to replacement, it all should be encompassed in the middle of claim scope of the present invention.

Claims (4)

1. the method for a distributed file system Data Migration, it is characterized in that, described method comprises: when distributed file system Data Migration, the distributed document that the source file of migration is chosen as frequent modification or writes, for the distributed document moving, by its modification or the data that newly write are directly write in destination node to be migrated, on legacy data, set up the index of new data, unmodified data are moved again;
Described method comprises the steps:
(1), by the modification to distributed document or write-access number of times statistics, determine that the high distributed document of the access frequency is the source file of migration;
(2) in the time having data to write source file, client is obtained after layout information to meta data server, sends to the source node of appointment;
(3) source node creates index node in the destination node of migration, then forwards the data on index node;
(4) index node completes after data write and returns to source node, and source node is revised index record;
(5) source node returns to client, completes and writes, and be equivalent to the migration work of this blocks of data;
(6) background controller moves the content not writing, and copies data from source node and writes destination node, and record the index record writing;
(7) when the content on source node all moves to after index node, notice meta data server revised file layout information, deletes native object, and so far distributed document Data Migration is complete.
2. the method for claim 1, is characterized in that, described modification or the data that newly write directly write to and in destination node to be migrated, comprise following manner:
Mode 1: when data arrive source node, be directly forwarded in destination node to be migrated by source node;
Mode 2: write and fashionablely directly write in destination node to be migrated by client, the then object of notification source node.
3. the method for claim 1, is characterized in that, the index of setting up new data on described legacy data comprises: the index relative of setting up source node and destination node by bitmap file bitmap, array or tree construction;
The relation that records source node and destination node by the corresponding 1bit of the 4K of minimum operation unit of client, often writes once, and bitmap file bitmap, array or the tree construction of correspondence skew place are set to 1.
4. the method for claim 1, is characterized in that, in described step (3), source node checks that the content that reads is whether on index node, if read the content on index node; If not, directly reading local content returns.
CN201410005142.7A 2014-01-06 2014-01-06 A kind of method of Distributed File System Data migration Active CN103793475B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410005142.7A CN103793475B (en) 2014-01-06 2014-01-06 A kind of method of Distributed File System Data migration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410005142.7A CN103793475B (en) 2014-01-06 2014-01-06 A kind of method of Distributed File System Data migration

Publications (2)

Publication Number Publication Date
CN103793475A true CN103793475A (en) 2014-05-14
CN103793475B CN103793475B (en) 2017-06-06

Family

ID=50669141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410005142.7A Active CN103793475B (en) 2014-01-06 2014-01-06 A kind of method of Distributed File System Data migration

Country Status (1)

Country Link
CN (1) CN103793475B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105279166A (en) * 2014-06-20 2016-01-27 中国电信股份有限公司 File management method and system
CN106570093A (en) * 2016-10-24 2017-04-19 南京中新赛克科技有限责任公司 Independent metadata organization structure-based massive data migration method and apparatus
WO2018036235A1 (en) * 2016-08-22 2018-03-01 中兴通讯股份有限公司 Solr data migration method and apparatus
CN108848180A (en) * 2018-06-27 2018-11-20 郑州云海信息技术有限公司 A kind of metadata synchronization method, device, equipment and readable storage medium storing program for executing
CN109388610A (en) * 2018-08-30 2019-02-26 中国科学院计算技术研究所 A kind of distributed meta data services migrating method and system of low latency
CN109558457A (en) * 2018-12-11 2019-04-02 浪潮(北京)电子信息产业有限公司 A kind of method for writing data, device, equipment and storage medium
US10659531B2 (en) 2017-10-06 2020-05-19 International Business Machines Corporation Initiator aware data migration

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110307534A1 (en) * 2009-03-25 2011-12-15 Zte Corporation Distributed file system supporting data block dispatching and file processing method thereof
CN102567444A (en) * 2011-10-25 2012-07-11 无锡城市云计算中心有限公司 Method for optimizing distributed file system data access
CN102841931A (en) * 2012-08-03 2012-12-26 中兴通讯股份有限公司 Storage method and storage device of distributive-type file system
CN103067433A (en) * 2011-10-24 2013-04-24 阿里巴巴集团控股有限公司 Method, device and system of data migration of distributed type storage system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110307534A1 (en) * 2009-03-25 2011-12-15 Zte Corporation Distributed file system supporting data block dispatching and file processing method thereof
CN103067433A (en) * 2011-10-24 2013-04-24 阿里巴巴集团控股有限公司 Method, device and system of data migration of distributed type storage system
CN102567444A (en) * 2011-10-25 2012-07-11 无锡城市云计算中心有限公司 Method for optimizing distributed file system data access
CN102841931A (en) * 2012-08-03 2012-12-26 中兴通讯股份有限公司 Storage method and storage device of distributive-type file system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105279166A (en) * 2014-06-20 2016-01-27 中国电信股份有限公司 File management method and system
WO2018036235A1 (en) * 2016-08-22 2018-03-01 中兴通讯股份有限公司 Solr data migration method and apparatus
CN106570093A (en) * 2016-10-24 2017-04-19 南京中新赛克科技有限责任公司 Independent metadata organization structure-based massive data migration method and apparatus
CN106570093B (en) * 2016-10-24 2020-03-27 南京中新赛克科技有限责任公司 Mass data migration method and device based on independent metadata organization structure
US10659531B2 (en) 2017-10-06 2020-05-19 International Business Machines Corporation Initiator aware data migration
CN108848180A (en) * 2018-06-27 2018-11-20 郑州云海信息技术有限公司 A kind of metadata synchronization method, device, equipment and readable storage medium storing program for executing
CN109388610A (en) * 2018-08-30 2019-02-26 中国科学院计算技术研究所 A kind of distributed meta data services migrating method and system of low latency
CN109558457A (en) * 2018-12-11 2019-04-02 浪潮(北京)电子信息产业有限公司 A kind of method for writing data, device, equipment and storage medium
CN109558457B (en) * 2018-12-11 2022-04-22 浪潮(北京)电子信息产业有限公司 Data writing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN103793475B (en) 2017-06-06

Similar Documents

Publication Publication Date Title
US11662936B2 (en) Writing data using references to previously stored data
CN103793475A (en) Distributed file system data migration method
CN107526743B (en) Method and apparatus for compressing file system metadata
US8768980B2 (en) Process for optimizing file storage systems
KR102564170B1 (en) Method and device for storing data object, and computer readable storage medium having a computer program using the same
CN105183839A (en) Hadoop-based storage optimizing method for small file hierachical indexing
CN104866497A (en) Metadata updating method and device based on column storage of distributed file system as well as host
CN104657500A (en) Distributed storage method based on KEY-VALUE pair
CN105205082A (en) Method and system for processing file storage in HDFS
CN103595797B (en) Caching method for distributed storage system
TW201702860A (en) Storage apparatus and method for autonomous space compaction
CN107729558A (en) Method, system, device and the computer-readable storage medium that file system fragmentation arranges
US11042328B2 (en) Storage apparatus and method for autonomous space compaction
CN107135662B (en) Differential data backup method, storage system and differential data backup device
WO2021213281A1 (en) Data reading method and system
CN105630923A (en) Method for realizing archives administration informatization
CN103778219A (en) HBase-based method for updating incremental indexes
CN104102552A (en) Message processing method and device
CN105631010A (en) Optimization method based on HDFS small file storage
CN104156327A (en) Method for recognizing object power failure in write back mode in distributed file system
US20170286442A1 (en) File system support for file-level ghosting
US11789622B2 (en) Method, device and computer program product for storage management
WO2022121274A1 (en) Metadata management method and apparatus in storage system, and storage system
CN104850548A (en) Method and system used for implementing input/output process of big data platform
CN114115734A (en) Data deduplication method, device, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant