CN105930545A - Method and device for migrating files - Google Patents

Method and device for migrating files Download PDF

Info

Publication number
CN105930545A
CN105930545A CN201610512718.8A CN201610512718A CN105930545A CN 105930545 A CN105930545 A CN 105930545A CN 201610512718 A CN201610512718 A CN 201610512718A CN 105930545 A CN105930545 A CN 105930545A
Authority
CN
China
Prior art keywords
file
node
distribution formula
formula volume
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610512718.8A
Other languages
Chinese (zh)
Other versions
CN105930545B (en
Inventor
江萍
于相洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Uniview Technologies Co Ltd
Original Assignee
Zhejiang Uniview Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Uniview Technologies Co Ltd filed Critical Zhejiang Uniview Technologies Co Ltd
Priority to CN201610512718.8A priority Critical patent/CN105930545B/en
Publication of CN105930545A publication Critical patent/CN105930545A/en
Application granted granted Critical
Publication of CN105930545B publication Critical patent/CN105930545B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/119Details of migration of file systems

Abstract

The invention provides a method and a device for migrating files. The method comprises the following steps: selecting a first file with a highest priority from all files to be migrated; selecting a first node by using the weight of each node in an overall data set; recording a corresponding relation between each node and the weight of a distributed storage system in the overall data set; and migrating the first file from a source distributed sub volume to a target distributed sub volume via the first node. Through the technical scheme of the invention, all nodes included in the source distributed sub volume participate in a file migration; the file migrating speed can be improved; the situation of the excess pressure of a certain node can be avoided; the processing performance of each node cannot be influenced; the pressure load of the nodes can be balanced; meanwhile, the data consistency of the files before and after the migrating can be ensured.

Description

A kind of method and apparatus of file migration
Technical field
The present invention relates to technical field of memory, the method and apparatus particularly relating to a kind of file migration.
Background technology
For distributed memory system, data reliability and space availability ratio are difficult to have both, when ensure that When comparing the highest data reliability, then space availability ratio can be caused relatively low.DHT (Distributed Hash Table, Distributed hashtable) it is the distributed memory system that a kind of space availability ratio is higher, in order to ensure that data are reliable Property and fault-tolerance, it is also possible in DHT use EC (Erasure Coding, correcting and eleting codes) algorithm.
As it is shown in figure 1, be the networking schematic diagram of distributed memory system.When needs write file A, first The position being written by DHT algorithm, such as distribution formula volume D1.Then by EC algorithm, by file A is written to be distributed on each node of formula volume D1.Such as, file A is divided into 3 parts, obtains 3 numbers According to, and by this 3 number according to being respectively written on distribution formula the volume node 1 of D1, node 2, node 3, Further, it is also possible to be written to redundant data corresponding for file A be distributed on the node 4 of formula volume D1.
But, when being written position by DHT algorithm, it may appear that following situation: distribution formula volume In D1, the file of write is more, and the file write in being distributed formula volume D2 is less, thus imbalance occurs Situation, affect the performance of DHT.For solving the problems referred to above, data balancing (Rebalance) can be used Algorithm, migrated file in each distribution formula volume, so that the quantity of documents in each distribution formula volume is close, Guarantee the performance of DHT.Such as, the file A in distribution formula volume D1 and file B is moved to distributed Son volume D2, to ensure that the quantity of documents in distribution formula volume D1 and distribution formula volume D2 substantially balances.
During current file migration, generally migrated All Files by node 1, as node 1 will distribution File A and file B in formula volume D1 move to be distributed formula volume D2, when the quantity of documents needing migration Time more, the operating pressure of node 1 is very big, and migration velocity is relatively slow, affects the process performance of node 1.
Summary of the invention
The present invention provides a kind of method of file migration, said method comprising the steps of:
The first file that priority is the highest is selected from all files to be migrated;
The weight utilizing each node in global data set selects primary nodal point;Wherein, described overall situation number According to the corresponding relation that have recorded each node in distributed memory system and weight in set;
By primary nodal point, from source distribution formula volume, described first file is moved to target distribution formula roll up;
Increasing the weight of described primary nodal point in described global data set, arranging described first file is not File to be migrated, if currently there is also file to be migrated, then returns and performs to select from all files to be migrated Select out the process of the first the highest file of priority.
The weight of the described each node utilized in global data set selects the process of primary nodal point, specifically wraps Include: gathered by the local inquiring about the source distribution formula volume at described first file place corresponding, obtain described source Distribution formula rolls up each node comprised;Described source distribution formula volume bag is inquired from described global data set The weight of each node contained, and the node selecting weight minimum is described primary nodal point;Or,
From global data set, select the node that weight is minimum, and judge whether the current node selected is positioned at In the local set of the source distribution formula volume correspondence at described first file place;If it is, will currently select Node is defined as described primary nodal point;If it does not, select weight second little from described global data set Node, and continue executing with the process judging whether the current node selected is positioned at the set of described local, until The current node selected is positioned at the set of described local, and the node currently selected is defined as described primary nodal point;
Wherein, the set of described local have recorded described source distribution formula and roll up each node comprised.
Described volume from source distribution formula by described first file by primary nodal point moves to target distribution formula volume After, described method also includes: moved to from source distribution formula volume by described first file at described primary nodal point After target distribution formula has been rolled up, then in described global data set, reduce the weight of described primary nodal point.
It is rolled onto the transition process of target distribution formula volume from source distribution formula at described first file, described side Method farther includes: obtain the first file attribute information that described first file is corresponding, and described first file belongs to Property information include the information of the labelling that file migrating, described target distribution formula volume;
According to described first file attribute information, determine described first file just in transition process, and by institute The more new data stating the first file is sent to described source distribution formula volume and described target distribution formula volume.
At described primary nodal point, from source distribution formula volume, described first file is moved to target distribution formula rolled up After one-tenth, described method farther includes: obtain the second file attribute information that described first file is corresponding, Described second file attribute information includes that file is complete the labelling of migration or does not include that file migrates Labelling;According to described second file attribute information, determine that described first file is complete migration, and will The more new data of described first file is sent to described target distribution formula volume.
The present invention provides the device of a kind of file migration, described device to specifically include:
First selects module, for selecting the first file that priority is the highest from all files to be migrated;
Second selects module, for utilizing the weight of each node in global data set to select primary nodal point; Described global data set have recorded the corresponding relation of each node in distributed memory system and weight;
Transferring module, for moving to target by described first file from source distribution formula volume by primary nodal point Distribution formula volume;
Processing module, for increasing the weight of described primary nodal point in described global data set, arranges institute Stating the first file is not file to be migrated, if currently there is also file to be migrated, is then selected by described first Module performs to select the process of the first the highest file of priority from all files to be migrated.
Described second selects module, and the weight specifically for each node in utilizing global data set is selected During selecting primary nodal point, the local corresponding by inquiring about the source distribution formula volume at described first file place Set, obtains described source distribution formula and rolls up each node comprised;Institute is inquired from described global data set State the weight of each node that source distribution formula volume comprises, and the node selecting weight minimum is described primary nodal point; Or, from global data set, select the node that weight is minimum, and judge the current node selected whether position In the local set of the source distribution formula volume correspondence in described first file place;If it is, will currently select Node be defined as described primary nodal point;If it does not, select weight second little from described global data set Node, and continue executing with the process judging whether the current node selected is positioned at the set of described local, always It is positioned at the set of described local to the current node selected, and the node currently selected is defined as described first segment Point;Wherein, the set of described local have recorded described source distribution formula and roll up each node comprised.
Described processing module, is additionally operable to moved from source distribution formula volume by described first file by primary nodal point After moving on to target distribution formula volume, at described primary nodal point, described first file is moved from source distribution formula volume Move on to after target distribution formula rolled up, described global data set reduce the weight of described primary nodal point.
Described device also includes:
Acquisition module, for being rolled onto the migration of target distribution formula volume at described first file from source distribution formula During, obtain the first file attribute information that described first file is corresponding, described first file attribute information The information of the labelling migrated including file, described target distribution formula volume;
Sending module, for according to described first file attribute information, determines that described first file migrates During, and the more new data of described first file is sent to described source distribution formula volume and described target divide Cloth formula is rolled up.
Described acquisition module, is additionally operable to from source distribution formula volume, described first file is being moved to target distribution After formula has been rolled up, obtain the second file attribute information that described first file is corresponding, described second file Attribute information includes that file is complete the labelling of migration or does not include the labelling that file is migrating;
Described sending module, is additionally operable to, according to described second file attribute information, determine described first file Through completing migration, and the more new data of described first file is sent to described target distribution formula volume.
Based on technique scheme, in the embodiment of the present invention, when needs migrated file, can dynamically select For carrying out the node of file migration, all nodes that source distribution formula volume can be allowed to comprise are involved in file and move Move, when the quantity of documents needing migration is more, multiple nodes jointly completes file migration, can improve File migration speed, it is to avoid the situation that certain node pressure is excessive occurs, does not affect the process performance of each node, The pressure load allowing node reaches equilibrium, guarantees the data consistency before and after file migration simultaneously.
Accompanying drawing explanation
In order to the embodiment of the present invention or technical scheme of the prior art are clearly described, below by right In the embodiment of the present invention or description of the prior art, the required accompanying drawing used is briefly described, it is clear that Ground, the accompanying drawing in describing below is only some embodiments described in the present invention, skill common for this area From the point of view of art personnel, it is also possible to obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the networking schematic diagram of distributed memory system;
Fig. 2 is the file migration schematic diagram in distribution formula volume;
Fig. 3 is the flow chart of the method for the file migration in one embodiment of the present invention;
Fig. 4 is the flow chart of the method for the file migration in another embodiment of the present invention;
Fig. 5 is the schematic diagram of the file migration process in another embodiment of the present invention;
Fig. 6 is the flow chart of the method for the file migration in another embodiment of the present invention;
Fig. 7 is the hardware structure diagram controlling equipment in one embodiment of the present invention;
Fig. 8 is the structure chart of the device of the file migration in one embodiment of the present invention.
Detailed description of the invention
In terminology used in the present invention merely for the sake of describing the purpose of specific embodiment, and the unrestricted present invention. " a kind of ", " described " and " being somebody's turn to do " also purport of singulative used in the present invention and claims Including most form, unless context clearly shows that other implication.It is also understood that used herein Term "and/or" refer to comprise any or all of one or more project of listing being associated may combination.
Although should be appreciated that may use term first, second, third, etc. to describe various letter in the present invention Breath, but these information should not necessarily be limited by these terms.These terms are only used for same type of information district each other Separately.Such as, without departing from the present invention, the first information can also be referred to as the second information, Similarly, the second information can also be referred to as the first information.Depend on linguistic context, additionally, the word used " if " can be construed to " ... time " or " when ... time " or " in response to determining ".
For problems of the prior art, a kind of method proposing file migration in the embodiment of the present invention, The method can apply in distributed memory system, as used the distributed storage of DHT algorithm+EC algorithm System.In this distributed memory system, a file needs to be written in a distribution formula volume, and often Individual distribution formula is rolled up across N+M node, and each node is equivalent to a storage device, distribution formula volume Taking a memory area in each node, and file is stored in this memory area, this memory area can To be physical disk or other storage organization, this memory area is not limited by the embodiment of the present invention.
As it is shown in figure 1, when needs write file A, first pass through the position that DHT algorithm is written, as Distribution formula volume D1.Then by EC algorithm, be written to file A to be distributed formula volume D1 N+M ( In Fig. 1, N is 3, and M is 1) on individual node.Such as, file A is divided into N part, obtains N number According to, and this N number is rolled up on N number of node of D1 according to being respectively written into distribution formula, and by file A pair The redundant data answered is written to be distributed on M the node of formula volume D1.Wherein, corresponding for file A redundancy Data can be determined by EC algorithm, as being the verification data of file A, to this in the embodiment of the present invention Not limiting, the effect of the redundant data that file A is corresponding is: when the data of file A occur to lose, make File A is recovered by this redundant data.Such as, when the data that file A stores on node 1 occur to lose, Storage redundant data on node 4 can be used to recover losing data, to obtain file A.
In one example, as in figure 2 it is shown, when the file write in distribution formula volume D1 is more, and be distributed In formula volume D2, the file of write is less, thus when there is unbalanced situation, then can use data balancing Algorithm, migrated file in each distribution formula volume, so that the quantity of documents in each distribution formula volume is close, Guarantee the performance of DHT.Such as, the file A in distribution formula volume D1, file B, file C are migrated To distribution formula volume D2, the most flat to ensure the quantity of documents in distribution formula volume D1 and distribution formula volume D2 Weighing apparatus.In this example, distribution formula volume D1 can be referred to as source distribution formula volume, and will distribution formula volume D2 is referred to as target distribution formula volume, and file A, file B, file C are referred to as file to be migrated.
In the embodiment of the present invention, the method that this document migrates can apply on control equipment, this control equipment Can be client, server and control node.Wherein, client may be located at PC (Personal Computer, personal computer) on, it is adapted to assist in each node and completes file migration function.Server is permissible It is positioned in cloud platform or other network, is adapted to assist in each node and completes file migration function.An example In son, can using all nodes (the most all storage devices) of distributed memory system as a cluster, And from all nodes, select a node (as selected first node), this node realize described cluster Control, can be referred to as controlling node by this node, its essence is a storage device in cluster.This The method of the file migration proposed in bright embodiment, it is also possible to apply on this control node.
In one example, global data set can be pre-configured with, and for the volume configuration of each distribution formula One local set.Wherein, this global data set have recorded each node in distributed memory system Comprise with the corresponding relation of weight, and the local set of distribution formula volume correspondence have recorded this distribution formula volume Each node.In one example, global data set can be realized by global data table, it is also possible to logical Crossing other type file to realize, local set can be realized by local table, it is also possible to by other type literary composition Part realizes, and in the embodiment of the present invention, the form to this global data set/local set does not limits.
Such as, in an initial condition, distributed memory system includes node 1, node 2, node 3, node File in 4, distribution formula volume D1, distribution formula volume D2, distribution formula volume D3 takies node 1, node 2, node 3, node 4.In subsequent process, distributed memory system is carried out dilatation, increase a joint File in point 5, and distribution formula volume D4 takies node 1, node 2, node 3, node 5.
Based on this, global data set can be as shown in table 1, in the local set that distribution formula volume D1 is corresponding Node 1, node 2, node 3, node 4 can be included, the local set of other distribution formula volume and distribution Local set corresponding for formula volume D1 is similar, follow-up no longer repeats.In another example, it is also possible to By the local table shown in table 2, record the local set of each distribution formula volume.In actual applications, table 1, Each node in table 2, it is possible to use unique mark of each node represents, such as the IP address of each node.? In table 1, the initial value of the weight that each node is corresponding can be 0, follow-up is adjusted this weight.
Table 1
Node Weight
Node 1 (such as IP1) 0
Node 2 (such as IP2) 0
Node 3 (such as IP3) 0
Node 4 (such as IP4) 0
Node 5 (such as IP5) 0
Table 2
Distribution formula volume Node
Distribution formula volume D1 Node 1 (such as IP1), node 2 (such as IP2), node 3 (such as IP3), node 4 (such as IP4)
Distribution formula volume D2 Node 1 (such as IP1), node 2 (such as IP2), node 3 (such as IP3), node 4 (such as IP4)
Distribution formula volume D3 Node 1 (such as IP1), node 2 (such as IP2), node 3 (such as IP3), node 4 (such as IP4)
Distribution formula volume D4 Node 1 (such as IP1), node 2 (such as IP2), node 3 (such as IP3), node 5 (such as IP5)
Under above-mentioned application scenarios, as it is shown on figure 3, the method that this document migrates may comprise steps of:
Step 301, selects the first file that priority is the highest from all files to be migrated.
In one example, the file size of each file to be migrated can be obtained, and according to file size be Each file prioritization to be migrated;Wherein, the file to be migrated that file size is the biggest, its priority is more High.Priority based on each file to be migrated, can select priority the highest from all files to be migrated The first file.It is of course also possible to select other priority dividing mode, divide for each file to be migrated Priority, does not limits this priority dividing mode in the embodiment of the present invention.
Step 302, utilizes the weight of each node in global data set to select primary nodal point.
Step 303, moves to target distribution formula by the first file from source distribution formula volume by primary nodal point Volume, i.e. issues the announcing removal of the first file, primary nodal point is receiving this announcing removal to primary nodal point Afterwards, this first file is moved to target distribution formula volume from source distribution formula volume.
Step 304, increases the weight of primary nodal point in global data set, and arranges the first file and do not treat Migrated file.If there is currently file to be migrated, then return step 302;Otherwise, flow process is terminated.
Wherein, when being migrated the first file by primary nodal point, by increasing the weight of primary nodal point, thus keep away The weight exempting from primary nodal point is the most minimum, it is to avoid is migrated all files to be migrated by a node, allows source divide All nodes that cloth formula volume comprises are involved in file migration, improve file migration speed, it is to avoid occur certain The situation that node pressure is excessive, does not affect the process performance of each node, allows the pressure load of node reach equilibrium.
In one example, the weight utilizing each node in global data set selects the mistake of primary nodal point Journey, specifically can include but not limited to following manner: mode one, divide by inquiring about the source at the first file place The local set that cloth formula volume is corresponding, obtains this source distribution formula and rolls up each node comprised;From global data collection Conjunction inquires the weight of each node that this source distribution formula volume comprises, and the node selecting weight minimum is the One node.Or, mode two, the node that selection weight is minimum from global data set, and judge current Whether the node selected is positioned in the local set of source distribution formula volume correspondence at the first file place;If it is, Then the node currently selected is defined as primary nodal point;If it is not, then select weight from global data set Second little node, and continue executing with the process judging whether the current node selected is positioned at local set, one Until the current node selected is positioned at local set, and the node currently selected is defined as primary nodal point.
By above-mentioned implementation, it is ensured that primary nodal point is a memory node of the first file, thus The transmission of data can be reduced, it is to avoid the waste of transfer resource, improve resource utilization.And, by upper State implementation, it is ensured that primary nodal point be the first file memory node in the minimum node of weight (i.e. The transmission the lightest node of pressure), therefore, it is possible to use the lightest node of transmission pressure migrates priority The first high file, and the highest the first file of priority is the file to be migrated that file size is maximum, therefore, The technique scheme of the embodiment of the present invention, be use the transmission the lightest node of pressure migrate maximum wait move Text move part, such that it is able to reasonably utilize the transmission performance of each node, load balancing between each node.
In one example, by primary nodal point, the first file is being moved to target is dividing from source distribution formula volume After cloth formula volume, when the first file is moved to target distribution formula volume from source distribution formula volume by primary nodal point After completing, then can also reduce the weight of this primary nodal point in global data set.
By above-mentioned implementation, after the first file migration completes, by reducing the weight of this primary nodal point, Make this primary nodal point can transmit again other file, it is ensured that the reasonable employment of each node, it is to avoid this first segment Problem that is o'clock after the first file migration completes, still too high due to weight, that cause transmitting file.
In one example, it is rolled onto the transition process of target distribution formula volume at the first file from source distribution formula In, it is also possible to obtain the first file attribute information that this first file is corresponding, wherein, this first file attribute Information can include but not limited to: the information of labelling that file is migrating, target distribution formula volume;Enter one Step, according to this first file attribute information, it may be determined that this first file is just in transition process, and incites somebody to action The more new data of this first file is sent to source distribution formula volume and target distribution formula is rolled up.
In one example, at primary nodal point, the first file is moved to target distribution formula from source distribution formula volume After son has been rolled up, it is also possible to obtain the second file attribute information that this first file is corresponding, wherein, this Two file attribute informations can include but not limited to: file is complete the labelling of migration or does not include file The labelling migrated;Further, according to this second file attribute information, it may be determined that this first file It is complete migration, and the more new data of this first file is sent to target distribution formula volume.
Based on technique scheme, in the embodiment of the present invention, when needs migrated file, can dynamically select For carrying out the node of file migration, all nodes that source distribution formula volume can be allowed to comprise are involved in file and move Move, when the quantity of documents needing migration is more, multiple nodes jointly completes file migration, can improve File migration speed, it is to avoid the situation that certain node pressure is excessive occurs, does not affect the process performance of each node, The pressure load allowing node reaches equilibrium, guarantees the data consistency before and after file migration simultaneously.
Below in conjunction with the flow chart shown in Fig. 4, the method for the file migration of the embodiment of the present invention is illustrated.
Step 401, is pre-configured with global data set, and for one local collection of each distribution formula volume configuration Close.Wherein, this global data set have recorded the right of each node in distributed memory system and weight Should be related to, and the local set of distribution formula volume correspondence have recorded each node that this distribution formula volume comprises.
For convenience of explanation, follow-up with global data set such as table 3, local set is as a example by table 2.
Table 3
In one example, after Rebalance mechanism is triggered, then need to migrate in source distribution formula volume File.If the file in only one of which source distribution formula volume needs to migrate, then for this source distribution formula Involve in row subsequent treatment.If having the file in multiple source distribution formula volume to need to migrate, the most each source distribution The processing mode of formula volume is identical, follow-up illustrates as a example by the process of source distribution formula volume.
Step 402, obtains the file size of each file to be migrated, and is that each waiting is moved according to this document size Text move part prioritization;Wherein, the file to be migrated that file size is the biggest, its priority is the highest.
Such as, at needs, the file A in distribution formula volume D1, file B, file C are moved to distributed During son volume D2, distribution formula volume D1 can be referred to as source distribution formula volume, and distribution formula volume D2 is claimed Roll up for target distribution formula, and file A, file B, file C are referred to as file to be migrated.Based on this, Can obtain file A, file B, the file size of file C respectively, and according to each file size be file A, File B, file C prioritization.For example, it is assumed that the file that the file size of file A is more than file B Size, the file size of the file B file size more than file C, then the priority of file A is higher than file The priority of B, the priority of the file B priority higher than file C.
Step 403, selects the first file that priority is the highest from all files to be migrated.Such as, from treating The file A of migration, file B, file C select the highest file A of priority (the i.e. first file).
Step 404, selects the node that weight is minimum from global data set.Such as, based on shown in table 2 Global data set, then the node of the weight selected from global data set minimum is node 2.
Wherein, the node that the weight in global data set is minimum refers to transmit the node that pressure is the lightest.
Step 405, it is judged that whether the current node selected is positioned at the source distribution formula volume correspondence at the first file place Local set in.If it is, perform step 406.If it is not, then from global data set right to choose Weigh the second little node, and continue executing with step 405.When again performing step 405, if it is judged that be No, then from global data set, select the little node of weight the 3rd, and continue executing with step 405.With this type of Push away, until the current node selected is positioned in the local set of correspondence, and perform step 406.
Such as, it is judged that the distribution the formula whether current node 2 selected is positioned at file A place rolls up 1 correspondence Local set, can be known by inquiry table 2, and node 2 is positioned at distribution formula and rolls up the local set of 1 correspondence.
Step 406, is defined as primary nodal point by the node currently selected, i.e. primary nodal point is node 2.
Step 407, moves to target distribution formula by the first file from source distribution formula volume by primary nodal point Volume, i.e. issues the announcing removal of the first file, primary nodal point is receiving this announcing removal to primary nodal point Afterwards, this first file is moved to target distribution formula volume from source distribution formula volume.
Owing to primary nodal point is the node 2 of the weight minimum in global data set, therefore, it is to be pressed by transmission The node 2 that power is the lightest transmits the file A of maximum, such that it is able to the transmission performance of each node of Appropriate application.
In one example, move to be distributed formula volume D2 from distribution formula volume D1 by file A at node 2 During, node 2 gets the part data of file A from node 1, gets literary composition from node 3 The part data of part A.Node 2 is by the file A on the part data of the file A on node 2, node 1 Part data, the part data of file A on node 3, combine, obtain a complete literary composition Part A.File A is written to be distributed formula volume D2 by node 2, and such as, file A is divided into 3 by node 2 Part, obtain 3 number evidences, by this 3 number according to be respectively written into the distribution formula volume node 1 of D2, node 2, On node 3, it is written to redundant data corresponding for file A be distributed on the node 4 of formula volume D2.
Wherein, node 2, when collecting the data of file A, is that file A is corresponding due to store in node 4 Redundant data, therefore, node 2 need not in collector node 4 redundant data of storage.
In one example, node 2 by file A from distribution formula volume D1 move to be distributed formula volume D2 complete After one-tenth, node 2 can also delete all data corresponding to file A from distribution formula volume D1.Such as, Node 2 deletes all data corresponding for file A from node 1, node 2, node 3, node 4.
By above-mentioned implementation, owing to node 2 is a memory node of file A, therefore collecting literary composition During the data of part A, it is not necessary to collect data from node 2, it is only necessary to collect from node 1 and node 3 Data, such that it is able to reduce the transmission of data, it is to avoid the waste of transfer resource, improve resource utilization.
Step 408, increases the weight of primary nodal point in global data set, and arranges the first file and do not treat Migrated file.Such as, global data set increases the weight of node 2, as shown in table 4, and arranges File A is no longer file to be migrated, and file A the most carries out migrating or being complete migration.
Wherein, when increasing the weight of node 2, the weight of node 2 can be added 1, or add 2, or Adding 3, the numerical value increased for weight in the embodiment of the present invention does not limits, can be with arbitrary disposition.
Table 4
Node Weight
Node 1 (such as IP1) 2
Node 2 (such as IP2) 3
Node 3 (such as IP3) 1
Node 4 (such as IP4) 3
Node 5 (such as IP5) 3
Step 409, it is judged that currently whether there is also file to be migrated.If it is, perform step 403, continue Continue from all files to be migrated, select the first file that priority is the highest;Otherwise, flow process is terminated.
Owing to currently there is also file to be migrated, i.e. file B and file C, therefore from all files to be migrated In select the file B that priority is the highest, from global data set, select the node 3 that weight is minimum, by saving File B is moved to be distributed formula volume D2 by point 3 from distribution formula volume D1, increases in global data set The weight of node 3, and to arrange file B be no longer file to be migrated, again performs step 409.
Owing to currently there is also file to be migrated, i.e. file C, therefore selects excellent from all files to be migrated The file C that first level is the highest, selects the node 1 that weight is minimum, by node 1 by file from global data set C moves to be distributed formula volume D2 from distribution formula volume D1, increases the power of node 1 in global data set Weight, and to arrange file C be no longer file to be migrated, again performs step 409.
Owing to there is currently no file to be migrated, therefore terminate flow process, do not return again to perform step 403.
In one example, file A is moved to be distributed formula volume D2's by node 2 from distribution formula volume D1 Process, it doesn't matter with the process re-executing step 403-step 409, can be at node 2 by file A During distribution formula volume D1 moves to be distributed formula volume D2, perform step 403-step 409.
In one example, when primary nodal point, the first file is moved to target distribution formula from source distribution formula volume After son has been rolled up, then can also reduce the weight of this primary nodal point in global data set.Further, After the first file migration completes, by reducing the weight of this primary nodal point so that this primary nodal point is the most permissible Transmit other file, it is ensured that the reasonable employment of each node, it is to avoid this primary nodal point completes at the first file migration After, still too high due to weight, the problem that causes transmitting file.Such as, when node 2 is by file A After distribution formula volume D1 moves to be distributed formula volume D2 completes, it is also possible to drop in global data set The weight of this primary nodal point low, when being to add 2 such as above-mentioned increase weight, then reducing weight can be to subtract 2.
Based on technique scheme, in the embodiment of the present invention, when needs migrated file, can dynamically select For carrying out the node of file migration, all nodes that source distribution formula volume can be allowed to comprise are involved in file and move Move, when the quantity of documents needing migration is more, multiple nodes jointly completes file migration, can improve File migration speed, it is to avoid the situation that certain node pressure is excessive occurs, does not affect the process performance of each node, The pressure load allowing node reaches equilibrium, guarantees the data consistency before and after file migration simultaneously.
In actual applications, even if node assume responsibility for the migration work of multiple file, due to each distribution File parking space in formula volume is independent, does not the most also have the situation reading file conflict.
In one example, as it is shown in figure 5, be the schematic diagram of file migration process, when upper-layer service needs When file is written to be distributed formula volume D1, file declustering can be become N+M part, and be written into N+M On individual node.When file needs to migrate, based on the flow process shown in Fig. 4, N can be carried out by primary nodal point The collection of number evidence, then it is recalculated into N+M part, move to be distributed all nodes of formula volume D2.
Under normal circumstances, when the file being distributed on formula volume D1 needs to migrate, upper-layer service may remain unchanged Operation this document.Based on this, it is rolled onto the migration of target distribution formula volume at the first file from source distribution formula Cheng Zhong, it is also possible to obtain the first file attribute information, and according to this first file attribute information, determine first File just in transition process, and the more new data of the first file is sent to source distribution formula volume and target divide Cloth formula is rolled up.At primary nodal point, from source distribution formula volume, the first file is moved to target distribution formula rolled up Afterwards, it is also possible to obtain the second file attribute information, and according to the second file attribute information, the first literary composition is determined Part is complete migration, and the more new data of the first file is sent to target distribution formula volume.
Due to be the more new data of the first file is simultaneously sent to source distribution formula volume and target distribution formula Volume, therefore, if the more new data being sent to source distribution formula volume or target distribution formula volume is lost, Also it is not result in that mistake occurs in file, thus ensures this document data consistency before and after migrating.
In one example, can be by API (the Application Programming disposed on the control device Interface, application programming interface) module performs the above steps of the embodiment of the present invention.Apply at this Under scene, below in conjunction with the flow chart shown in Fig. 6, the method for above-mentioned file migration is described in detail.
Step 601, the file A on distribution formula volume D1 is persistently write by upper-layer service.
Step 602, the data of file A are written to be distributed formula volume D1 by API module.
Step 603, when file A needs to be migrated, API module obtains the file attribute information of file A, This document attribute information can include the labelling that file is migrating, the information being distributed formula volume D2.
Step 604, API module according to this document attribute information, determine file A just in transition process, and It is sent to the more new data of this document A be distributed formula volume D1 and distribution formula volume D2.
Wherein, the most stored for file A data the most just move to be distributed formula volume D2 from distribution formula volume D1.
In one example, the more new data of file A is probably the data being appended to file A end, it is possible to Can be the amendment data to file A (data as in the data in alternate file A, or deletion file A).
When the more new data of file A is the data being appended to file A end, owing to needs are from distribution formula It has been determined that therefore volume D1 moves to be distributed the data length of formula volume D2, is sent to be distributed formula volume D1 The more new data of file A, will not be migrated to be distributed formula volume D2, former data will not be impacted.
When the amendment data that the more new data of file A is to file A, if by the more new data of file A Be sent to be distributed formula volume D1 after, distribution formula volume D1 on file A is modified, then will After file A moves to be distributed formula volume D2, no longer file A is modified.If by file A more After new data is sent to be distributed formula volume D1, file A is not modified by distribution formula volume D1, Then after moving to file A be distributed formula volume D2, owing to there is also file A's on distribution formula volume D2 More new data, therefore can utilize the more new data on distribution formula volume D2 to modify file A.
Due to be the more new data of the first file is simultaneously sent to source distribution formula volume and target distribution formula Volume, therefore, if the more new data being sent to source distribution formula volume or target distribution formula volume is lost, Also it is not result in that mistake occurs in file, thus ensures this document data consistency before and after migrating.
Step 605, after file A has migrated, API module obtains the file attribute information of file A, This document attribute information can include that file is complete the labelling of migration or does not include what file was migrating Labelling, in both cases, all represents that file A is complete migration.
Step 606, API module, according to this document attribute information, determines that file A is complete migration, and will The more new data of this document A is sent to be distributed formula volume D2, and is no longer sent to by the more new data of file A Distribution formula volume D1, moreover, it is also possible to delete file A in distribution formula volume D1.
Based on the inventive concept as said method, the embodiment of the present invention additionally provides a kind of file migration Device, this document migrate device apply on the control device.Wherein, the device that this document migrates is permissible Realized by software, it is also possible to realize by the way of hardware or software and hardware combining.As a example by implemented in software, As the device on a logical meaning, it is the processor controlling equipment by its place, reads non-volatile Property memorizer in corresponding computer program instructions formed.For hardware view, as it is shown in fig. 7, be A kind of hardware structure diagram controlling equipment at the device place of the file migration that the present invention proposes, except Fig. 7 institute Outside the processor that shows, nonvolatile memory, control equipment can also include other hardware, such as responsible process The forwarding chip of message, network interface, internal memory etc.;From the point of view of from hardware configuration, this control equipment is also possible to It is distributed apparatus, potentially includes multiple interface card, in order to carry out the extension of Message processing at hardware view.
As shown in Figure 8, for the structure chart of device of the file migration that the present invention proposes, described device includes:
First selects module 11, for selecting the first file that priority is the highest from all files to be migrated; Second selects module 12, for utilizing the weight of each node in global data set to select primary nodal point; Described global data set records the corresponding relation of each node in distributed memory system and weight;Move Shifting formwork block 13, for moving to target distribution by described first file from source distribution formula volume by primary nodal point Formula is rolled up;Processing module 14, for increasing the weight of described primary nodal point in described global data set, Arranging described first file is not file to be migrated, if currently there is also file to be migrated, then by described One selects module 12 to perform to select the process of the first the highest file of priority from all files to be migrated.
In one example, the device of described file migration also includes: acquisition module (in the drawings depending on not going out), For obtaining the file size of each file to be migrated, it is described each literary composition to be migrated according to described file size Part prioritization;The file to be migrated that file size is the biggest, its priority is the highest.
Described second selects module 12, specifically for the weight of each node in utilizing global data set During selecting primary nodal point, the office corresponding by inquiring about the source distribution formula volume at described first file place Portion gathers, and obtains described source distribution formula and rolls up each node comprised;Inquire from described global data set Described source distribution formula rolls up the weight of each node comprised, and the node selecting weight minimum is described first segment Point;Or, from global data set, select the node that weight is minimum, and judge that the current node selected is In the local set of the no source distribution formula volume correspondence being positioned at described first file place;If it is, will be current The node selected is defined as described primary nodal point;If it does not, select weight the from described global data set Two little nodes, and continue executing with the process judging whether the current node selected is positioned at the set of described local, Until the current node selected is positioned at the set of described local, and the node currently selected is defined as described the One node;Wherein, the set of described local have recorded described source distribution formula and roll up each node comprised.
Described processing module 14, is additionally operable to rolled up from source distribution formula by described first file by primary nodal point After moving to target distribution formula volume, at described primary nodal point, described first file is moved from source distribution formula volume Move on to after target distribution formula rolled up, described global data set reduce the weight of described primary nodal point.
Described acquisition module, is additionally operable to be rolled onto target distribution formula volume at described first file from source distribution formula Transition process in, obtain the first file attribute information that described first file is corresponding, described first file belongs to Property information include the information of the labelling that file migrating, described target distribution formula volume;
Described device also includes: sending module (does not embodies) in the drawings, for belonging to according to described first file Property information, determines that the more new data of described first file, just in transition process, and is sent out by described first file Deliver to described source distribution formula volume and described target distribution formula is rolled up.
Described acquisition module, is additionally operable to from source distribution formula volume, described first file is being moved to target distribution After formula has been rolled up, obtain the second file attribute information that described first file is corresponding, described second file Attribute information includes that file is complete the labelling of migration or does not include the labelling that file is migrating;
Described sending module, is additionally operable to, according to described second file attribute information, determine described first file Through completing migration, and the more new data of described first file is sent to described target distribution formula volume.
Wherein, the modules of apparatus of the present invention can be integrated in one, it is also possible to separates and disposes.Above-mentioned mould Block can merge into a module, it is also possible to is further split into multiple submodule.
Through the above description of the embodiments, those skilled in the art is it can be understood that arrive the present invention The mode of required general hardware platform can be added by software to realize, naturally it is also possible to by hardware, but very In the case of Duo, the former is more preferably embodiment.Based on such understanding, technical scheme is substantially The part contributed prior art in other words can embody with the form of software product, this computer Software product is stored in a storage medium, including some instructions with so that a computer equipment (can To be personal computer, server, or the network equipment etc.) perform the side described in each embodiment of the present invention Method.It will be appreciated by those skilled in the art that accompanying drawing is the schematic diagram of a preferred embodiment, the mould in accompanying drawing Block or flow process are not necessarily implemented necessary to the present invention.
It will be appreciated by those skilled in the art that the module in the device in embodiment can according to embodiment describe into Row is distributed in the device of embodiment, it is also possible to carry out respective change be disposed other than one of the present embodiment or In multiple devices.The module of above-described embodiment can merge into a module, it is possible to is further split into multiple Submodule.The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
The several specific embodiments being only the present invention disclosed above, but, the present invention is not limited to this, The changes that any person skilled in the art can think of all should fall into protection scope of the present invention.

Claims (10)

1. the method for a file migration, it is characterised in that said method comprising the steps of:
The first file that priority is the highest is selected from all files to be migrated;
The weight utilizing each node in global data set selects primary nodal point;Wherein, described overall situation number According to the corresponding relation that have recorded each node in distributed memory system and weight in set;
By primary nodal point, from source distribution formula volume, described first file is moved to target distribution formula roll up;
Increasing the weight of described primary nodal point in described global data set, arranging described first file is not File to be migrated, if currently there is also file to be migrated, then returns and performs to select from all files to be migrated Select out the process of the first the highest file of priority.
Method the most according to claim 1, it is characterised in that described utilize in global data set The weight of each node selects the process of primary nodal point, specifically includes:
Gathered by the local inquiring about the source distribution formula volume at described first file place corresponding, obtain described source Distribution formula rolls up each node comprised;Described source distribution formula volume bag is inquired from described global data set The weight of each node contained, and the node selecting weight minimum is described primary nodal point;Or,
From global data set, select the node that weight is minimum, and judge whether the current node selected is positioned at In the local set of the source distribution formula volume correspondence at described first file place;If it is, will currently select Node is defined as described primary nodal point;If it does not, select weight second little from described global data set Node, and continue executing with the process judging whether the current node selected is positioned at the set of described local, until The current node selected is positioned at the set of described local, and the node currently selected is defined as described primary nodal point;
Wherein, the set of described local have recorded described source distribution formula and roll up each node comprised.
Method the most according to claim 1, it is characterised in that described by primary nodal point by described One file is after source distribution formula volume moves to target distribution formula volume, and described method also includes:
At described primary nodal point, from source distribution formula volume, described first file is moved to target distribution formula rolled up After one-tenth, then in described global data set, reduce the weight of described primary nodal point.
4. according to the method described in any one of claim 1-3, it is characterised in that described first file from Source distribution formula is rolled onto in the transition process of target distribution formula volume, and described method farther includes:
Obtaining the first file attribute information that described first file is corresponding, described first file attribute information includes The information of the labelling that file is migrating, described target distribution formula volume;
According to described first file attribute information, determine described first file just in transition process, and by institute The more new data stating the first file is sent to described source distribution formula volume and described target distribution formula volume.
Method the most according to claim 4, it is characterised in that at described primary nodal point by described first File moves to after target distribution formula rolled up from source distribution formula volume, and described method farther includes:
Obtaining the second file attribute information that described first file is corresponding, described second file attribute information includes File is complete the labelling of migration or does not include the labelling that file is migrating;
According to described second file attribute information, determine that described first file is complete migration, and by described The more new data of the first file is sent to described target distribution formula volume.
6. the device of a file migration, it is characterised in that described device specifically includes:
First selects module, for selecting the first file that priority is the highest from all files to be migrated;
Second selects module, for utilizing the weight of each node in global data set to select primary nodal point; Described global data set have recorded the corresponding relation of each node in distributed memory system and weight;
Transferring module, for moving to target by described first file from source distribution formula volume by primary nodal point Distribution formula volume;
Processing module, for increasing the weight of described primary nodal point in described global data set, arranges institute Stating the first file is not file to be migrated, if currently there is also file to be migrated, is then selected by described first Module performs to select the process of the first the highest file of priority from all files to be migrated.
Device the most according to claim 6, it is characterised in that
Described second selects module, and the weight specifically for each node in utilizing global data set is selected During selecting primary nodal point, the local corresponding by inquiring about the source distribution formula volume at described first file place Set, obtains described source distribution formula and rolls up each node comprised;Institute is inquired from described global data set State the weight of each node that source distribution formula volume comprises, and the node selecting weight minimum is described primary nodal point; Or, from global data set, select the node that weight is minimum, and judge the current node selected whether position In the local set of the source distribution formula volume correspondence in described first file place;If it is, will currently select Node be defined as described primary nodal point;If it does not, select weight second little from described global data set Node, and continue executing with the process judging whether the current node selected is positioned at the set of described local, always It is positioned at the set of described local to the current node selected, and the node currently selected is defined as described first segment Point;Wherein, the set of described local have recorded described source distribution formula and roll up each node comprised.
Device the most according to claim 6, it is characterised in that
Described processing module, is additionally operable to moved from source distribution formula volume by described first file by primary nodal point After moving on to target distribution formula volume, at described primary nodal point, described first file is moved from source distribution formula volume Move on to after target distribution formula rolled up, described global data set reduce the weight of described primary nodal point.
9. according to the device described in any one of claim 6-8, it is characterised in that described device also includes:
Acquisition module, for being rolled onto the migration of target distribution formula volume at described first file from source distribution formula During, obtain the first file attribute information that described first file is corresponding, described first file attribute information The information of the labelling migrated including file, described target distribution formula volume;
Sending module, for according to described first file attribute information, determines that described first file migrates During, and the more new data of described first file is sent to described source distribution formula volume and described target divide Cloth formula is rolled up.
Device the most according to claim 9, it is characterised in that
Described acquisition module, is additionally operable to from source distribution formula volume, described first file is being moved to target distribution After formula has been rolled up, obtain the second file attribute information that described first file is corresponding, described second file Attribute information includes that file is complete the labelling of migration or does not include the labelling that file is migrating;
Described sending module, is additionally operable to, according to described second file attribute information, determine described first file Through completing migration, and the more new data of described first file is sent to described target distribution formula volume.
CN201610512718.8A 2016-06-29 2016-06-29 A kind of method and apparatus of file migration Active CN105930545B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610512718.8A CN105930545B (en) 2016-06-29 2016-06-29 A kind of method and apparatus of file migration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610512718.8A CN105930545B (en) 2016-06-29 2016-06-29 A kind of method and apparatus of file migration

Publications (2)

Publication Number Publication Date
CN105930545A true CN105930545A (en) 2016-09-07
CN105930545B CN105930545B (en) 2019-07-16

Family

ID=56830236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610512718.8A Active CN105930545B (en) 2016-06-29 2016-06-29 A kind of method and apparatus of file migration

Country Status (1)

Country Link
CN (1) CN105930545B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107087031A (en) * 2017-05-10 2017-08-22 浙江宇视科技有限公司 A kind of storage resource load-balancing method and device
CN109981697A (en) * 2017-12-27 2019-07-05 深圳市优必选科技有限公司 A kind of file dump method, system, server and storage medium
CN111198649A (en) * 2018-11-16 2020-05-26 浙江宇视科技有限公司 Disk selection method and device
CN111897494A (en) * 2020-07-27 2020-11-06 星辰天合(北京)数据科技有限公司 Target file processing method and device
CN113590535A (en) * 2021-09-30 2021-11-02 中国人民解放军国防科技大学 Efficient data migration method and device for deduplication storage system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7987153B2 (en) * 2006-09-29 2011-07-26 Electronics And Telecommunications Research Institute Apparatus and method for automatically migrating user's working data
CN103106152A (en) * 2012-12-13 2013-05-15 深圳先进技术研究院 Data scheduling method based on gradation storage medium
CN103561057A (en) * 2013-10-15 2014-02-05 深圳清华大学研究院 Data storage method based on distributed hash table and erasure codes
CN105404474A (en) * 2015-12-07 2016-03-16 上海爱数信息技术股份有限公司 Data migration method of heterogeneous distributed memory system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7987153B2 (en) * 2006-09-29 2011-07-26 Electronics And Telecommunications Research Institute Apparatus and method for automatically migrating user's working data
CN103106152A (en) * 2012-12-13 2013-05-15 深圳先进技术研究院 Data scheduling method based on gradation storage medium
CN103561057A (en) * 2013-10-15 2014-02-05 深圳清华大学研究院 Data storage method based on distributed hash table and erasure codes
CN105404474A (en) * 2015-12-07 2016-03-16 上海爱数信息技术股份有限公司 Data migration method of heterogeneous distributed memory system

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107087031A (en) * 2017-05-10 2017-08-22 浙江宇视科技有限公司 A kind of storage resource load-balancing method and device
CN109981697A (en) * 2017-12-27 2019-07-05 深圳市优必选科技有限公司 A kind of file dump method, system, server and storage medium
CN109981697B (en) * 2017-12-27 2021-09-17 深圳市优必选科技有限公司 File unloading method, system, server and storage medium
CN111198649A (en) * 2018-11-16 2020-05-26 浙江宇视科技有限公司 Disk selection method and device
CN111198649B (en) * 2018-11-16 2023-07-21 浙江宇视科技有限公司 Disk selection method and device
CN111897494A (en) * 2020-07-27 2020-11-06 星辰天合(北京)数据科技有限公司 Target file processing method and device
CN113590535A (en) * 2021-09-30 2021-11-02 中国人民解放军国防科技大学 Efficient data migration method and device for deduplication storage system
CN113590535B (en) * 2021-09-30 2021-12-17 中国人民解放军国防科技大学 Efficient data migration method and device for deduplication storage system

Also Published As

Publication number Publication date
CN105930545B (en) 2019-07-16

Similar Documents

Publication Publication Date Title
CN105930545A (en) Method and device for migrating files
CN107302561B (en) A kind of hot spot data Replica placement method in cloud storage system
US5548724A (en) File server system and file access control method of the same
CN103502926B (en) Extent-based storage architecture
US7685459B1 (en) Parallel backup
US6615313B2 (en) Disk input/output control device maintaining write data in multiple cache memory modules and method and medium thereof
CN102662992B (en) Method and device for storing and accessing massive small files
US20150127649A1 (en) Efficient implementations for mapreduce systems
US8768980B2 (en) Process for optimizing file storage systems
US7689764B1 (en) Network routing of data based on content thereof
US20090182960A1 (en) Using multiple sidefiles to buffer writes to primary storage volumes to transfer to corresponding secondary storage volumes in a mirror relationship
CN103513938B (en) A kind of RAID RAID system expansion method and device
CN107196982A (en) A kind for the treatment of method and apparatus of user's request
US20070073986A1 (en) Remote copy control in a storage system
CN105630418A (en) Data storage method and device
CN107807794A (en) A kind of date storage method and device
US8516070B2 (en) Computer program and method for balancing processing load in storage system, and apparatus for managing storage devices
CN107329704A (en) One kind caching mirror method and controller
CN106603692A (en) Data storage method in distributed storage system and apparatus thereof
CN106682184A (en) Light-weight combination method based on log combination tree structure
JP2009245004A (en) Bidirectional data arrangement system, access analysis server, data movement server, bidirectional data arrangement method and program
CN103177080A (en) File pre-reading method and file pre-reading device
KR101465447B1 (en) Method for external merge sort, system for external merge sort and distributed processing system for external merge sort
KR101531564B1 (en) Method and System for load balancing of iSCSI storage system used network distributed file system and method
US11023431B2 (en) Split data migration in a data storage system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant