CN109614383A - Data copy method, device, electronic equipment and storage medium - Google Patents

Data copy method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN109614383A
CN109614383A CN201811387721.7A CN201811387721A CN109614383A CN 109614383 A CN109614383 A CN 109614383A CN 201811387721 A CN201811387721 A CN 201811387721A CN 109614383 A CN109614383 A CN 109614383A
Authority
CN
China
Prior art keywords
back end
source
data
files
source file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811387721.7A
Other languages
Chinese (zh)
Other versions
CN109614383B (en
Inventor
费伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Golden Panda Co Ltd
Original Assignee
Golden Panda Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Golden Panda Co Ltd filed Critical Golden Panda Co Ltd
Priority to CN201811387721.7A priority Critical patent/CN109614383B/en
Publication of CN109614383A publication Critical patent/CN109614383A/en
Application granted granted Critical
Publication of CN109614383B publication Critical patent/CN109614383B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The embodiment provides a kind of data copy method, device, electronic equipment and storage mediums, are related to big data technical field.This method comprises: obtaining the position of the source data node where source file block and the position of the purpose back end where purpose blocks of files;Whether the position of the position and the purpose back end that judge the source data node belongs to the same back end;When determining to belong to same back end, the source file block is replicated using hard chain mode;When determining to be not belonging to same back end, then the source file block is copied into the purpose blocks of files by the way of data copy.The technical solution of the embodiment of the present invention can significantly improve duplicating efficiency, reduce the occupancy to practical hard-disc storage space.

Description

Data copy method, device, electronic equipment and storage medium
Technical field
The present invention relates to big data technical field, in particular to a kind of data copy method, data copy device, Electronic equipment and computer readable storage medium.
Background technique
With the development of internet technology, distributed file system such as HDFS (Hadoop Distributed File System, distributed file system) using more and more extensive.In distributed file system such as HDFS, it is often necessary to right The file of back end storage is replicated or is copied.
In a kind of technical solution, when carrying out file copy inside distributed system such as HDFS, it can be ordered by Cp The mode of order carries out file copy, can also carry out file copy by way of Distcp order.The mode of Cp be obtain to Listed files all under catalogue is copied, the copy of file metadata, blocks of files is then carried out.Distcp mode is also first to obtain It needs to copy listed files all under catalogue, then starts distribution map task according to the parameter of configuration, carry out concurrent type frog File duplication.
However, actual file read-write can all occur for the mode of Cp order and Distcp order, require first to read source document Then destination address is written again in source file by part.In a distributed system, it also occur that the case where across a network is read and write.Both Since the speed of copy is limited by hardware disk, network interface card, concurrent process, the data for often copying a large capacity need scheme Want several hours.In addition, in both schemes, it is right due to will use actual disk space after carrying out file copy In the more distributed file system of repeated data, disk space usage is extremely low.
Accordingly, it is desirable to provide a kind of data copy method, the number of the one or more problems being able to solve in the above problem According to reproducing unit, electronic equipment and computer readable storage medium.
It should be noted that information is only used for reinforcing the reason to background of the present invention disclosed in above-mentioned background technology part Solution, therefore may include the information not constituted to the prior art known to persons of ordinary skill in the art.
Summary of the invention
The purpose of the present invention is to provide a kind of data copy method, data copy device, electronic equipment and computers Readable storage medium storing program for executing, and then overcome at least to a certain extent due to the limitation copy by hardware disk, network interface card, concurrent process The low problem of the problem and disk space usage of time length.
According to a first aspect of the embodiments of the present invention, a kind of data copy method is provided, using with multiple data sections The distributed system of point, comprising: the position for obtaining the source data node where source file block and the mesh where purpose blocks of files Back end position;It is same whether the position of the position and the purpose back end that judge the source data node belongs to A back end;When determining to belong to same back end, the source file block is replicated using hard chain mode;Determining When being not belonging to same back end, then the source file block is copied into the purpose blocks of files by the way of data copy.
In some embodiments of the invention, aforementioned schemes, the data copy method are based on further include: receiving pair When the data of the source file update request, request is updated based on the data and determines file to be updated from name node Block;Judge that the blocks of files to be updated is linked with the presence or absence of hard chain;Determining to create temporary file block there are when the link of hard chain The content of the source file block is replicated, and data are carried out to the temporary file block and update operation;Determining that hard chain chain is not present When connecing, data directly are carried out to blocks of files to be updated and update operation.
In some embodiments of the invention, aforementioned schemes, the data copy method are based on further include: traversal title section The catalogue of source file in point obtains all source file block messages of source file;Described in being obtained from the source file block message The position of source data node creates purpose blocks of files based on the position of the source file block message and the source data node; Replication task is generated based on the source file block, the position of the source data node, the purpose blocks of files.
In some embodiments of the invention, aforementioned schemes, the data copy method further include: from the duplication are based on The position of the source data node where the source file block is obtained in task;It is determined based on the position of the source data node Purpose back end where the purpose blocks of files.
In some embodiments of the invention, aforementioned schemes are based on, described in the position determination based on the source data node Purpose back end where purpose blocks of files, comprising: the position of the source data node is determined as the purpose blocks of files Purpose back end.
In some embodiments of the invention, aforementioned schemes are based on, the source file block is obtained from the replication task The position of the source data node at place, comprising: obtain the source document from the replication task by way of multithreading The position of the source data node where part.
In some embodiments of the invention, aforementioned schemes are based on, the source file block is answered using hard chain mode System, comprising: the source file block is directed toward to the newly-built link of the purpose blocks of files in the source data node.
According to a second aspect of the embodiments of the present invention, a kind of data copy device is provided, using with multiple data sections The distributed system of point, comprising: information acquisition unit, for obtaining position and the mesh of the source data node where source file block Blocks of files where purpose back end position;Judging unit, for judge the position of the source data node with it is described Whether the position of purpose back end belongs to the same back end;Local replica unit, for determining to belong to same data When node, the source file block is replicated using hard chain mode;Data copy cell, for determining to be not belonging to same number When according to node, then the source file block is copied into the purpose blocks of files by the way of data duplication.
According to a third aspect of the embodiments of the present invention, a kind of electronic equipment is provided, comprising: processor;And memory, It is stored with computer-readable instruction on the memory, is realized when the computer-readable instruction is executed by the processor as above State data copy method described in first aspect.
According to a fourth aspect of the embodiments of the present invention, a kind of computer readable storage medium is provided, meter is stored thereon with Calculation machine program realizes the data copy method as described in above-mentioned first aspect when the computer program is executed by processor.
In the technical solution provided by some embodiments of the present invention, on the one hand, obtain the source data section of source file block The position of the purpose back end of the position and purpose blocks of files of point, being capable of position according to source data node and purpose data The position of node judges whether source file block and purpose blocks of files belong to the same back end;On the other hand, in source file block When belonging to same back end with purpose blocks of files, source file is replicated using hard chain mode, due to hard chain mode not into The duplication of row actual file reduces the occupancy to practical hard-disc storage space so as to significantly improve duplicating efficiency, improves hard disk The utilization rate of memory space.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not It can the limitation present invention.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets implementation of the invention Example, and be used to explain the principle of the present invention together with specification.It should be evident that the accompanying drawings in the following description is only the present invention Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.In the accompanying drawings:
Fig. 1 shows the flow diagram of data copy method according to some embodiments of the present invention;
Fig. 2 shows the flow diagrams of the data copy method of other embodiments according to the present invention;
Fig. 3 shows the schematic block diagram of the data copy device of an exemplary embodiment according to the present invention;
Fig. 4 shows the structural schematic diagram for being suitable for the computer system for the electronic equipment for being used to realize the embodiment of the present invention.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be real in a variety of forms It applies, and is not understood as limited to embodiment set forth herein;On the contrary, thesing embodiments are provided so that the present invention will be comprehensively and complete It is whole, and the design of example embodiment is comprehensively communicated to those skilled in the art.Identical appended drawing reference indicates in figure Same or similar part, thus repetition thereof will be omitted.
In addition, described feature, structure or characteristic can be incorporated in one or more implementations in any suitable manner In example.In the following description, many details are provided to provide and fully understand to the embodiment of the present invention.However, It will be appreciated by persons skilled in the art that technical solution of the present invention can be practiced without one or more in specific detail, Or it can be using other methods, constituent element, device, step etc..In other cases, it is not shown in detail or describes known side Method, device, realization or operation are to avoid fuzzy each aspect of the present invention.
Block diagram shown in the drawings is only functional entity, not necessarily must be corresponding with physically separate entity. I.e., it is possible to realize these functional entitys using software form, or realized in one or more hardware modules or integrated circuit These functional entitys, or these functional entitys are realized in heterogeneous networks and/or processor device and/or microcontroller device.
Flow chart shown in the drawings is merely illustrative, it is not necessary to including all content and operation/step, It is not required to execute by described sequence.For example, some operation/steps can also decompose, and some operation/steps can close And or part merge, therefore the sequence actually executed is possible to change according to the actual situation.
Fig. 1 shows the flow diagram of data copy method according to some embodiments of the present invention.Institute referring to Fig.1 Show, which may comprise steps of:
Step S110, the position for obtaining the source data node where source file block and the purpose number where purpose blocks of files According to the position of node;
Step S120, it is same whether the position of the position and the purpose back end that judge the source data node belongs to A back end;
Step S130 replicates the source file block using hard chain mode when determining to belong to same back end;
Step S140, when determining to be not belonging to same back end, then by the source file by the way of data copy Block copies to the purpose blocks of files.
According to the data copy method in the example embodiment of Fig. 1, on the one hand, obtain the source data node of source file block The position of the purpose back end of position and purpose blocks of files, being capable of position according to source data node and purpose back end Position judge source file block and whether purpose blocks of files belongs to the same back end;On the other hand, in source file block and mesh Blocks of files when belonging to same back end, source file is replicated using hard chain mode, since hard chain mode is without reality File duplication in border reduces the occupancy to practical hard-disc storage space so as to significantly improve duplicating efficiency, improves hard-disc storage The utilization rate in space.
In the following, the data copy method in the example embodiment to Fig. 1 is described in detail.
In step s 110, the position for obtaining the source data node where source file block and the mesh where purpose blocks of files Back end position.
In the exemplary embodiment, all of to be copied or duplication source file can be inquired from name node NameNode File block message, and obtain the location information of back end locating for these source file blocks.Further, it is also possible to be obtained from name node Take the position of the purpose back end of the back end where purpose file.Name node, that is, NameNode is distributed field system Manager in system is responsible for management file system name space, duplication of data block etc..Back end, that is, DataNode is file The basic unit of storage saves the content of the file in distributed file system in the form of data block or blocks of files.
Step S120, it is same whether the position of the position and the purpose back end that judge the source data node belongs to A back end.
In the exemplary embodiment, judge position and the purpose file of the source data node of the back end where source file block Whether the position of the purpose back end of the back end where block belongs to the same back end.
Step S130 replicates the source file block using hard chain mode when determining to belong to same back end.
In the exemplary embodiment, the position of the source data node of the back end where determining source file block and purpose text When the position of the purpose back end of back end where part block belongs to same back end, copied using hard chain mode or local The mode of shellfish replicates source file block, i.e., in the way of the hard chain of linux system, creates a link and be directed toward source file Block.
Hard link (hard link) is equivalent to an alias of blocks of files.What it was directed toward is a file inode (index Node) reference address, rather than file path in soft link is directed toward.So the file in hard link, which is made an amendment, will affect To the authentic document pointed by it, after doing deletion movement to hard link, if the file inode pointed by it is currently without quilt If external hard link reference, then original can be deleted, and otherwise original will not be deleted.
Step S140, when determining to be not belonging to same back end, then by the source file by the way of data copy Block copies to the purpose blocks of files.
In the exemplary embodiment, the position of the source data node of the back end where determining source file block and purpose text When the position of the purpose back end of back end where part block is not belonging to same back end, by the way of data copy Source file block is copied into purpose blocks of files.The mode of data copy needs the data of source block being written to other data sections Practically reading and writing data occurs for point.
In addition, in the exemplary embodiment, traversing the catalogue of the source file in name node, obtaining all source documents of source file Part block message;The position that the source data node of source file block is obtained from source file block message is affiliated back end, is based on The position of source file block message and source data node creates purpose blocks of files;Based on source file block, source data node position, Purpose blocks of files generates replication task or copy task.When creating purpose blocks of files, the purpose blocks of files of name node is generated Information.Node actual use insufficient space causes copy to fail in order to prevent, needs to generate the back end of preferential replicate data Information, the back end information is consistent with the back end information of the data block of source file, in the insufficient space of back end, It can be copied to other back end.
Further, in the exemplary embodiment, can be read by way of multithreading in replication task source file block and The information of purpose blocks of files, is communicated with name node, the back end where obtaining source file block in name node Position.
In addition, in the exemplary embodiment, due to having used hard chain mode, being carried out more to actual blocks of files or data block When new or modification, since source file block and purpose blocks of files are directed toward same data block, the content and purpose of source file block will lead to The content of blocks of files can all be modified, and need particularly to handle.Specifically, being asked receiving the data update to the source file When such as append being asked to update operation, request is updated based on data and determines blocks of files to be updated from name node;Judge institute Blocks of files to be updated is stated to link with the presence or absence of hard chain;Determining to create described in the duplication of temporary file block there are when the link of hard chain The content of source file block, and data are carried out to the temporary file block and update operation;When determining to link there is no hard chain, directly Data are carried out to blocks of files to be updated and update operation.
Fig. 2 shows the flow diagrams of the data copy method of other embodiments according to the present invention.
Referring to shown in Fig. 2, in step S210, the data copy request that client is sent is received.For example, client is initiated Duplicate requests are communicated with the back end where source file block, initiate data copy to the back end where source file Request.
Further, in the exemplary embodiment, the catalogue for traversing the source file in name node obtains all of source file Source file block message;The position that the source data node of source file block is obtained from source file block message is affiliated back end, Purpose blocks of files is created based on the position of source file block message and source data node;Based on source file block, source data node Position, purpose blocks of files generate replication task or copy task.When creating purpose blocks of files, the purpose text of name node is generated Part block message.Node actual use insufficient space causes copy to fail in order to prevent, needs to generate the data of preferential replicate data Nodal information, the back end information is consistent with the back end information of the data block of source file, the space of back end not When sufficient, it can be copied to other back end.
In step S220, data copy request is read by way of multithreading, and source is obtained from data copy request Blocks of files and purpose blocks of files communicate with name node back end position and purpose text where obtaining source file block Back end position where part block.
Position and purpose blocks of files institute in step S230, in the source data node of the back end where source file block The position of purpose back end of back end belong to same back end i.e. back end 1 when, using hard chain mode or The mode of local copy replicates source file block, i.e., in the way of the hard chain of linux system, creates a link direction source Blocks of files, the operation for creating link is substantially Millisecond, so as to significantly improve duplicating efficiency.It further, can be with The information of hard link block is sent to name node.
Position and purpose blocks of files institute in step S240, in the source data node of the back end where source file block The position of purpose back end of back end be not belonging to same back end when i.e. source file block belong to back end 1, Purpose blocks of files belongs to back end 2, and source file block is copied to back end from back end 1 by the way of data copy 2 purpose blocks of files.The mode of data copy needs for the data of source block to be written to other back end, occurs practically Reading and writing data.The mode of data copy is consistent with the mode that cp is copied with distcp.
In the exemplary embodiment, since data can carry out continuous iteration more new production based on original version, even if new When version carries out data production, and the partial document directly covered, the update operation of blocks of files can't occur.Using former When raw cp order or distcp command mode carry out file copy, one versions of data of several hours copies is generally required, The use space that can also additionally double.And after using the technical solution of example embodiments of the present invention to carry out data copy, The version iteration period for carrying out data greatly shortens, it is only necessary to which even copying for a versions of data can be completed for several seconds in a few minutes Shellfish, while the practical occupied space that data will not occur increases, so as to save significantly on human time's cost, reduce hardware Cost.
In addition, in an embodiment of the present invention, additionally providing a kind of data copy device, which can be answered With the distributed system with multiple back end.Referring to shown in Fig. 3, which may include: acquisition of information Unit 310, judging unit 320, local replica unit 330 and data copy cell 340.Wherein, information acquisition unit 310 is used In the position of the purpose back end where the position and purpose blocks of files for obtaining the source data node where source file block;Sentence Disconnected unit 320 is for judging whether the position of the source data node and the position of the purpose back end belong to same number According to node;Local replica unit 330 is used for when determining to belong to same back end, using hard chain mode to the source file block It is replicated;Data copy cell 340 is used for when determining to be not belonging to same back end, then will by the way of data duplication The source file block copies to the purpose blocks of files.
In some embodiments of the invention, aforementioned schemes, the data copy device 300 are based on further include: determine single Member, for updating request based on the data from name node when receiving the data update request to the source file Determine blocks of files to be updated;Hard chain links judging unit, for judging the blocks of files to be updated with the presence or absence of hard chain chain It connects;First updating unit, for determining that creation temporary file block replicates the interior of the source file block there are when the link of hard chain Hold, and data are carried out to the temporary file block and update operation;Second updating unit, for determining that there is no hard chains to link When, data directly are carried out to blocks of files to be updated and update operation.
In some embodiments of the invention, aforementioned schemes, the data copy device 300 further include: source file are based on Block message acquiring unit obtains all source file block messages of source file for traversing the catalogue of the source file in name node; Purpose blocks of files creating unit, for obtaining the position of the source data node from the source file block message, based on described The position of source file block message and the source data node creates purpose blocks of files;Replication task generation unit, for being based on The source file block, the position of the source data node, the purpose blocks of files generate replication task.
In some embodiments of the invention, aforementioned schemes, the data copy device 300 further include: obtain position are based on Unit is taken, the position for the source data node where obtaining the source file block in the replication task;Node is true Order member, for determining the purpose back end where the purpose blocks of files based on the position of the source data node.
In some embodiments of the invention, aforementioned schemes are based on, node determination unit is configured as: by the source data The position of node is determined as the purpose back end of the purpose blocks of files.
In some embodiments of the invention, aforementioned schemes are based on, position acquisition unit is configured as: by multithreading The position of the source data node of the mode where obtaining the source file in the replication task.
In some embodiments of the invention, aforementioned schemes are based on, local replica unit 330 is configured as: in the source Back end is directed toward the source file block to the newly-built link of the purpose blocks of files.
Each functional module and above-mentioned data duplication side due to the data copy device 300 of example embodiments of the present invention The step of example embodiment of method, is corresponding, therefore details are not described herein.
In an exemplary embodiment of the present invention, a kind of electronic equipment that can be realized the above method is additionally provided.
Below with reference to Fig. 4, it illustrates the computer systems 400 for the electronic equipment for being suitable for being used to realize the embodiment of the present invention Structural schematic diagram.The computer system 400 of electronic equipment shown in Fig. 4 is only an example, should not be to the embodiment of the present invention Function and use scope bring any restrictions.
As shown in figure 4, computer system 400 includes central processing unit (CPU) 401, it can be read-only according to being stored in Program in memory (ROM) 402 or be loaded into the program in random access storage device (RAM) 403 from storage section 408 and Execute various movements appropriate and processing.In RAM 403, it is also stored with various programs and data needed for system operatio.CPU 401, ROM 402 and RAM 403 is connected with each other by bus 404.Input/output (I/O) interface 405 is also connected to bus 404。
I/O interface 405 is connected to lower component: the importation 406 including keyboard, mouse etc.;It is penetrated including such as cathode The output par, c 407 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 408 including hard disk etc.; And the communications portion 409 of the network interface card including LAN card, modem etc..Communications portion 409 via such as because The network of spy's net executes communication process.Driver 410 is also connected to I/O interface 405 as needed.Detachable media 411, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 410, in order to read from thereon Computer program be mounted into storage section 408 as needed.
Particularly, according to an embodiment of the invention, may be implemented as computer above with reference to the process of flow chart description Software program.For example, the embodiment of the present invention includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 409, and/or from detachable media 411 are mounted.When the computer program is executed by central processing unit (CPU) 401, executes and limited in the system of the application Above-mentioned function.
It should be noted that computer-readable medium shown in the present invention can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the present invention, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In invention, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned Any appropriate combination.
Flow chart and block diagram in attached drawing are illustrated according to the system of various embodiments of the invention, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction It closes to realize.
Being described in unit involved in the embodiment of the present invention can be realized by way of software, can also be by hard The mode of part realizes that described unit also can be set in the processor.Wherein, the title of these units is in certain situation Under do not constitute restriction to the unit itself.
As on the other hand, present invention also provides a kind of computer-readable medium, which be can be Included in electronic equipment described in above-described embodiment;It is also possible to individualism, and without in the supplying electronic equipment. Above-mentioned computer-readable medium carries one or more program, when the electronics is set by one for said one or multiple programs When standby execution, so that the electronic equipment realizes such as above-mentioned data copy method as described in the examples.
For example, the electronic equipment may be implemented as shown in Figure 1: step S110 obtains the source where source file block The position of back end and the position of the purpose back end where purpose blocks of files;Step S120 judges the source data Whether the position of node and the position of the purpose back end belong to the same back end;Step S130 is determining to belong to When same back end, the source file block is replicated using hard chain mode;Step S140 is determining to be not belonging to same number When according to node, then the source file block is copied into the purpose blocks of files by the way of data copy.
It should be noted that although being referred to several modules for acting the device executed in the above detailed description Or unit, but this division is not enforceable.In fact, embodiment according to the present invention, above-described two Or more the feature and function of module or unit can be embodied in a module or unit.Conversely, above-described One module or the feature and function of unit can be to be embodied by multiple modules or unit with further division.
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the present invention The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating Equipment (can be personal computer, server, touch control terminal or network equipment etc.) executes embodiment according to the present invention Method.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its Its embodiment.This application is intended to cover any variations, uses, or adaptations of the invention, these modifications, purposes or Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the present invention Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claim is pointed out.
It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.

Claims (10)

1. a kind of data copy method, applied to the distributed system with multiple back end characterized by comprising
Obtain the position of the source data node where source file block and the position of the purpose back end where purpose blocks of files;
Whether the position of the position and the purpose back end that judge the source data node belongs to the same back end;
When determining to belong to same back end, the source file block is replicated using hard chain mode;
When determining to be not belonging to same back end, then the source file block is copied into the mesh by the way of data copy Blocks of files.
2. data copy method according to claim 1, which is characterized in that the data copy method further include:
When receiving the data update request to the source file, request is updated based on the data and is determined from name node Blocks of files to be updated;
Judge that the blocks of files to be updated is linked with the presence or absence of hard chain;
Determining that creation temporary file block replicates the content of the source file block, and to the interim text there are when the link of hard chain Part block carries out data and updates operation;
Determining directly to carry out data there is no when the link of hard chain to blocks of files to be updated and update operation.
3. data copy method according to claim 1, which is characterized in that the data copy method further include:
The catalogue for traversing the source file in name node, obtains all source file block messages of source file;
The position of the source data node is obtained from the source file block message, based on the source file block message and described The position of source data node creates purpose blocks of files;
Replication task is generated based on the source file block, the position of the source data node, the purpose blocks of files.
4. data copy method according to claim 3, which is characterized in that the data copy method further include:
The position of the source data node where obtaining the source file block in the replication task;
The purpose back end where the purpose blocks of files is determined based on the position of the source data node.
5. data copy method according to claim 4, which is characterized in that determined based on the position of the source data node Purpose back end where the purpose blocks of files, comprising:
The position of the source data node is determined as to the purpose back end of the purpose blocks of files.
6. data copy method according to claim 4, which is characterized in that obtain the source document from the replication task The position of the source data node where part block, comprising:
The position of the source data node by way of multithreading where obtaining the source file in the replication task.
7. data copy method according to any one of claim 1 to 6, which is characterized in that using hard chain mode to institute Source file block is stated to be replicated, comprising:
The source file block is directed toward to the newly-built link of the purpose blocks of files in the source data node.
8. a kind of data copy device, applied to the distributed system with multiple back end characterized by comprising
Information acquisition unit, for obtain the source data node where source file block position and purpose blocks of files where mesh Back end position;
Judging unit, for judge the source data node position and the purpose back end position whether belong to it is same A back end;
Local replica unit, for being carried out to the source file block using hard chain mode when determining to belong to same back end Duplication;
Data copy cell, for when determining to be not belonging to same back end, then by the source by the way of being replicated using data Blocks of files copies to the purpose blocks of files.
9. a kind of electronic equipment characterized by comprising processor;And memory, computer is stored on the memory Readable instruction is realized as described in any one of claims 1 to 7 when the computer-readable instruction is executed by the processor Data copy method.
10. a kind of computer readable storage medium, is stored thereon with computer program, the computer program is executed by processor Data copy method of the Shi Shixian as described in any one of claims 1 to 7.
CN201811387721.7A 2018-11-21 2018-11-21 Data copying method and device, electronic equipment and storage medium Active CN109614383B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811387721.7A CN109614383B (en) 2018-11-21 2018-11-21 Data copying method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811387721.7A CN109614383B (en) 2018-11-21 2018-11-21 Data copying method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109614383A true CN109614383A (en) 2019-04-12
CN109614383B CN109614383B (en) 2021-01-15

Family

ID=66004675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811387721.7A Active CN109614383B (en) 2018-11-21 2018-11-21 Data copying method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109614383B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988697A (en) * 2021-05-11 2021-06-18 北京华云安信息技术有限公司 Target file copying method, device, equipment and computer readable storage medium
CN115688187A (en) * 2023-01-04 2023-02-03 中科方德软件有限公司 Safety management method and device for hard link data, electronic equipment and computer readable storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060256712A1 (en) * 2003-02-21 2006-11-16 Nippon Telegraph And Telephone Corporation Device and method for correcting a path trouble in a communication network
CN102170440A (en) * 2011-03-24 2011-08-31 北京大学 Method suitable for safely migrating data between storage clouds
CN103685368A (en) * 2012-09-10 2014-03-26 中国电信股份有限公司 Method and system for migrating data
CN103761162A (en) * 2014-01-11 2014-04-30 深圳清华大学研究院 Data backup method of distributed file system
CN104603774A (en) * 2012-10-11 2015-05-06 株式会社日立制作所 Migration-destination file server and file system migration method
US20150347046A1 (en) * 2012-12-14 2015-12-03 Netapp, Inc. Push-based piggyback system for source-driven logical replication in a storage environment
US20160188232A1 (en) * 2013-09-05 2016-06-30 Nutanix, Inc. Systems and methods for implementing stretch clusters in a virtualization environment
CN107239480A (en) * 2016-03-28 2017-10-10 阿里巴巴集团控股有限公司 The method and apparatus that renaming operation is performed for distributed file system
CN108268542A (en) * 2016-12-31 2018-07-10 中国移动通信集团河北有限公司 For the method and system of data-base cluster Data Migration
CN108845892A (en) * 2018-04-19 2018-11-20 北京百度网讯科技有限公司 Data processing method, device, equipment and the computer storage medium of distributed data base

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060256712A1 (en) * 2003-02-21 2006-11-16 Nippon Telegraph And Telephone Corporation Device and method for correcting a path trouble in a communication network
CN102170440A (en) * 2011-03-24 2011-08-31 北京大学 Method suitable for safely migrating data between storage clouds
CN103685368A (en) * 2012-09-10 2014-03-26 中国电信股份有限公司 Method and system for migrating data
CN104603774A (en) * 2012-10-11 2015-05-06 株式会社日立制作所 Migration-destination file server and file system migration method
US20150347046A1 (en) * 2012-12-14 2015-12-03 Netapp, Inc. Push-based piggyback system for source-driven logical replication in a storage environment
US20160188232A1 (en) * 2013-09-05 2016-06-30 Nutanix, Inc. Systems and methods for implementing stretch clusters in a virtualization environment
CN103761162A (en) * 2014-01-11 2014-04-30 深圳清华大学研究院 Data backup method of distributed file system
US20150199243A1 (en) * 2014-01-11 2015-07-16 Research Institute Of Tsinghua University In Shenzhen Data backup method of distributed file system
CN107239480A (en) * 2016-03-28 2017-10-10 阿里巴巴集团控股有限公司 The method and apparatus that renaming operation is performed for distributed file system
CN108268542A (en) * 2016-12-31 2018-07-10 中国移动通信集团河北有限公司 For the method and system of data-base cluster Data Migration
CN108845892A (en) * 2018-04-19 2018-11-20 北京百度网讯科技有限公司 Data processing method, device, equipment and the computer storage medium of distributed data base

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WEIXIN_34327761: "复制指定源位置的多级文件夹下所有文件到指定目标位置", 《HTTPS://BLOG.CSDN.NET/WEIXIN_34327761/ARTICLE/DETAILS/85815584》 *
谯林飞: "云计算环境中分布式文件系统数据一致性问题研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988697A (en) * 2021-05-11 2021-06-18 北京华云安信息技术有限公司 Target file copying method, device, equipment and computer readable storage medium
CN115688187A (en) * 2023-01-04 2023-02-03 中科方德软件有限公司 Safety management method and device for hard link data, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN109614383B (en) 2021-01-15

Similar Documents

Publication Publication Date Title
JP7212040B2 (en) Content Management Client Synchronization Service
US20230090977A1 (en) Synchronized content library
US10360536B2 (en) Implementing a consistent ordering of operations in collaborative editing of shared content items
US10365916B2 (en) Providing access to a hybrid application offline
RU2500023C2 (en) Document synchronisation on protocol not using status information
US9325571B2 (en) Access permissions for shared content
US10338917B2 (en) Method, apparatus, and system for reading and writing files
CN102317923B (en) Storage system
CN108628874A (en) Method, apparatus, electronic equipment and the readable storage medium storing program for executing of migrating data
CN108920698A (en) A kind of method of data synchronization, device, system, medium and electronic equipment
CN109804361A (en) File synchronization in computing system
CN109614439A (en) Method of data synchronization, device, electronic equipment and storage medium
US10747643B2 (en) System for debugging a client synchronization service
CN108038153A (en) The online data moving method and device of Hbase
CN109767274B (en) Method and system for carrying out associated storage on massive invoice data
US10970193B2 (en) Debugging a client synchronization service
JP5721056B2 (en) Transaction processing apparatus, transaction processing method, and transaction processing program
CN109614383A (en) Data copy method, device, electronic equipment and storage medium
US11151093B2 (en) Distributed system control for on-demand data access in complex, heterogenous data storage
CN112334891A (en) Centralized storage for search servers
CN110119386A (en) Data processing method, data processing equipment, medium and calculating equipment
CN109344152A (en) Data processing method, device, electronic equipment and storage medium
US20210255998A1 (en) Method for object management using trace identifier, apparatus for the same, computer program for the same, and recording medium storing computer program thereof
CN109614440A (en) Method of data synchronization and relevant device based on big data
CN109445966A (en) Event-handling method, device, medium and calculating equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant