CN109614383B - Data copying method and device, electronic equipment and storage medium - Google Patents

Data copying method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN109614383B
CN109614383B CN201811387721.7A CN201811387721A CN109614383B CN 109614383 B CN109614383 B CN 109614383B CN 201811387721 A CN201811387721 A CN 201811387721A CN 109614383 B CN109614383 B CN 109614383B
Authority
CN
China
Prior art keywords
file block
source
data
data node
destination
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811387721.7A
Other languages
Chinese (zh)
Other versions
CN109614383A (en
Inventor
费伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Golden Panda Ltd
Original Assignee
Golden Panda Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Golden Panda Ltd filed Critical Golden Panda Ltd
Priority to CN201811387721.7A priority Critical patent/CN109614383B/en
Publication of CN109614383A publication Critical patent/CN109614383A/en
Application granted granted Critical
Publication of CN109614383B publication Critical patent/CN109614383B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data copying method and device, electronic equipment and a storage medium, and relates to the technical field of big data. The method comprises the following steps: acquiring the position of a source data node where a source file block is located and the position of a destination data node where a destination file block is located; judging whether the position of the source data node and the position of the destination data node belong to the same data node; when the source file blocks belong to the same data node, copying the source file blocks in a hard chain mode; and when the source file block does not belong to the same data node, copying the source file block to the destination file block in a data copying mode. The technical scheme of the embodiment of the invention can obviously improve the copying efficiency and reduce the occupation of the actual hard disk storage space.

Description

Data copying method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of big data technologies, and in particular, to a data replication method, a data replication device, an electronic device, and a computer-readable storage medium.
Background
With the development of internet technology, Distributed File systems such as HDFS (Hadoop Distributed File System) are increasingly widely used. In distributed file systems, such as HDFS, it is often necessary to replicate or copy files stored by data nodes.
In one technical scheme, when a file is copied in a distributed system such as an HDFS, the file can be copied by means of a Cp command or a Distcp command. The Cp mode is to obtain all file lists in the directory to be copied and then copy the file metadata and the file blocks. The Distcp mode also includes acquiring all file lists in the directory to be copied, then starting the distributed map task according to the configured parameters, and performing concurrent file replication.
However, both Cp and Distcp commands can cause actual file reading and writing, and both require reading the source file and then writing the source file into the destination address. In a distributed system, reading and writing across the network can also occur. The two schemes are limited by a hardware disk, a network card and a concurrent process due to the copying speed, so that the copying of a large volume of data usually needs several hours. In addition, in the two schemes, after the file copy is performed, since the actual disk space is used, the utilization rate of the disk space is extremely low for the distributed file system with more repeated data.
Accordingly, it is desirable to provide a data copying method, a data copying apparatus, an electronic device, and a computer-readable storage medium capable of solving one or more of the above-mentioned problems.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the invention and therefore may include information that does not constitute prior art that is already known to a person of ordinary skill in the art.
Disclosure of Invention
The invention aims to provide a data copying method, a data copying device, electronic equipment and a computer readable storage medium, and further solves the problems of long copying time and low utilization rate of disk space due to the limitation of a hardware disk, a network card and a concurrent process at least to a certain extent.
According to a first aspect of the embodiments of the present invention, there is provided a data replication method applied to a distributed system having a plurality of data nodes, including: acquiring the position of a source data node where a source file block is located and the position of a destination data node where a destination file block is located; judging whether the position of the source data node and the position of the destination data node belong to the same data node; when the source file blocks belong to the same data node, copying the source file blocks in a hard chain mode; and when the source file block does not belong to the same data node, copying the source file block to the destination file block in a data copying mode.
In some embodiments of the present invention, based on the foregoing scheme, the data replication method further includes: when a data update request for the source file is received, determining a file block to be updated from a name node based on the data update request; judging whether the file block to be updated has a hard chain link or not; when the existence of the hard chain link is judged, creating a temporary file block to copy the content of the source file block, and performing data updating operation on the temporary file block; and when judging that no hard chain link exists, directly carrying out data updating operation on the file block to be updated.
In some embodiments of the present invention, based on the foregoing scheme, the data replication method further includes: traversing the directory of the source file in the name node to acquire all source file block information of the source file; acquiring the position of the source data node from the source file block information, and creating a destination file block based on the source file block information and the position of the source data node; and generating a replication task based on the source file block, the position of the source data node and the destination file block.
In some embodiments of the present invention, based on the foregoing scheme, the data replication method further includes: acquiring the position of the source data node where the source file block is located from the replication task; and determining a destination data node where the destination file block is located based on the position of the source data node.
In some embodiments of the present invention, based on the foregoing solution, determining a destination data node where the destination file block is located based on the location of the source data node includes: and determining the position of the source data node as a destination data node of the destination file block.
In some embodiments of the present invention, based on the foregoing solution, obtaining, from the replication task, a location of the source data node where the source file block is located includes: and obtaining the position of the source data node where the source file is located from the replication task in a multithreading mode.
In some embodiments of the present invention, based on the foregoing scheme, the replicating the source file block in a hard chain manner includes: and creating a link pointing to the source file block by the source data node to the destination file block.
According to a second aspect of the embodiments of the present invention, there is provided a data replication apparatus applying a distributed system having a plurality of data nodes, including: the information acquisition unit is used for acquiring the position of a source data node where the source file block is located and the position of a destination data node where the destination file block is located; the judging unit is used for judging whether the position of the source data node and the position of the destination data node belong to the same data node; the local replication unit is used for replicating the source file block in a hard chain mode when judging that the source file block belongs to the same data node; and the data copying unit is used for copying the source file block to the destination file block in a data copying mode when the source file block is judged not to belong to the same data node.
According to a third aspect of embodiments of the present invention, there is provided an electronic apparatus, including: a processor; and a memory having computer readable instructions stored thereon which, when executed by the processor, implement the data replication method as described above in the first aspect.
According to a fourth aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the data replication method as described in the first aspect above.
In the technical solutions provided in some embodiments of the present invention, on one hand, the position of the source data node of the source file block and the position of the destination data node of the destination file block are obtained, and whether the source file block and the destination file block belong to the same data node can be determined according to the position of the source data node and the position of the destination data node; on the other hand, when the source file block and the destination file block belong to the same data node, the source file is copied by adopting a hard chain mode, and the actual file is not copied by adopting the hard chain mode, so that the copying efficiency can be obviously improved, the occupation of the actual hard disk storage space is reduced, and the utilization rate of the hard disk storage space is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 illustrates a flow diagram of a data replication method according to some embodiments of the invention;
FIG. 2 illustrates a flow diagram of a data replication method according to further embodiments of the present invention;
FIG. 3 shows a schematic block diagram of a data replication apparatus according to an exemplary embodiment of the present invention;
FIG. 4 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device to implement an embodiment of the invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals denote the same or similar parts in the drawings, and thus, a repetitive description thereof will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations or operations have not been shown or described in detail to avoid obscuring aspects of the invention.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
FIG. 1 illustrates a flow diagram of a data replication method according to some embodiments of the invention. Referring to fig. 1, the data replication method may include the steps of:
step S110, acquiring the position of a source data node where a source file block is located and the position of a destination data node where a destination file block is located;
step S120, judging whether the position of the source data node and the position of the destination data node belong to the same data node;
step S130, when the source file blocks belong to the same data node, copying the source file blocks in a hard chain mode;
step S140, when it is determined that the source file block does not belong to the same data node, copying the source file block to the destination file block in a data copy manner.
According to the data replication method in the example embodiment of fig. 1, on one hand, the position of the source data node of the source file block and the position of the destination data node of the destination file block are obtained, and whether the source file block and the destination file block belong to the same data node can be determined according to the positions of the source data node and the destination data node; on the other hand, when the source file block and the destination file block belong to the same data node, the source file is copied by adopting a hard chain mode, and the actual file is not copied by adopting the hard chain mode, so that the copying efficiency can be obviously improved, the occupation of the actual hard disk storage space is reduced, and the utilization rate of the hard disk storage space is improved.
Next, a data copying method in the exemplary embodiment of fig. 1 will be described in detail.
In step S110, the location of the source data node where the source file block is located and the location of the destination data node where the destination file block is located are obtained.
In an example embodiment, all file block information of a source file to be copied or copied may be queried from the name node NameNode, and location information of a data node where the source file blocks are located may be obtained. In addition, the location of the destination data node of the data node where the destination file is located may also be obtained from the name node. The name node, named NameNode, is the administrator in the distributed file system and is responsible for managing the file system name space, copying data blocks, and the like. A data node, i.e., a DataNode, is a basic unit of file storage, and stores the contents of files in a distributed file system in the form of data blocks or file blocks.
Step S120, determining whether the position of the source data node and the position of the destination data node belong to the same data node.
In an example embodiment, it is determined whether the source data node position of the data node where the source file block is located and the destination data node position of the data node where the destination file block is located belong to the same data node.
And step S130, when the source file blocks belong to the same data node, copying the source file blocks in a hard chain mode.
In an example embodiment, when it is determined that the source data node position of the data node where the source file block is located and the destination data node position of the data node where the destination file block is located belong to the same data node, the source file block is copied in a hard chain manner or a local copy manner, that is, a link is newly created to point to the source file block in a hard chain manner of a Linux system.
A hard link corresponds to an alias of a file block. It points to the reference address of a file inode (inode), rather than the file path point in the soft link. Therefore, after the hard link is deleted, if the inode of the file pointed by the hard link is not referred by the external hard link currently, the original file is deleted, otherwise, the original file is not deleted.
Step S140, when it is determined that the source file block does not belong to the same data node, copying the source file block to the destination file block in a data copy manner.
In an example embodiment, when it is determined that the source data node position of the data node where the source file block is located and the destination data node position of the data node where the destination file block is located do not belong to the same data node, the source file block is copied to the destination file block in a data copy manner. The data copying method needs to write the data of the source data block into other data nodes, and actual data reading and writing occur.
Further, in an example embodiment, a directory of the source file in the name node is traversed to obtain all source file block information of the source file; acquiring the position of a source data node of a source file block, namely a data node to which the source data node belongs, from the source file block information, and creating a destination file block based on the source file block information and the position of the source data node; and generating a replication task or a copy task based on the source file block, the position of the source data node and the destination file block. When creating the destination file block, destination file block information of the name node is generated. In order to prevent copy failure due to insufficient space actually used by a node, it is necessary to generate data node information for preferentially copying data, the data node information being identical to data node information of a data block of a source file, and when the space of a data node is insufficient, the data node information can be copied to another data node.
Further, in the example embodiment, the information of the source file block and the destination file block in the copy task may be read in a multi-thread manner, and the information is communicated with the name node, so as to obtain the location of the data node where the source file block is located from the name node.
In addition, in the example embodiment, because a hard-chain method is used, when an actual file block or data block is updated or modified, the content of the source file block and the content of the destination file block are modified because the source file block and the destination file block point to the same data block, and special processing is needed. Specifically, when a data update request, such as an apend update operation, for the source file is received, determining a file block to be updated from a name node based on the data update request; judging whether the file block to be updated has a hard chain link or not; when the existence of the hard chain link is judged, creating a temporary file block to copy the content of the source file block, and performing data updating operation on the temporary file block; and when judging that no hard chain link exists, directly carrying out data updating operation on the file block to be updated.
FIG. 2 shows a flow diagram of a data replication method according to further embodiments of the present invention.
Referring to fig. 2, in step S210, a data replication request sent by a client is received. For example, the client initiates a copy request, communicates with the data node where the source file block is located, and initiates a data copy request to the data node where the source file block is located.
Further, in an example embodiment, the directory of the source file in the name node is traversed to obtain all source file block information of the source file; acquiring the position of a source data node of a source file block, namely a data node to which the source data node belongs, from the source file block information, and creating a destination file block based on the source file block information and the position of the source data node; and generating a replication task or a copy task based on the source file block, the position of the source data node and the destination file block. When creating the destination file block, destination file block information of the name node is generated. In order to prevent copy failure due to insufficient space actually used by a node, it is necessary to generate data node information for preferentially copying data, the data node information being identical to data node information of a data block of a source file, and when the space of a data node is insufficient, the data node information can be copied to another data node.
In step S220, the data copy request is read in a multi-thread manner, the source file block and the destination file block are obtained from the data copy request, and the data node position where the source file block is located and the data node position where the destination file block is located are obtained by communicating with the name node.
In step S230, when the location of the source data node of the data node where the source file block is located and the location of the destination data node of the data node where the destination file block is located belong to the same data node, i.e., data node 1, the source file block is copied in a hard chain manner or a local copy manner, i.e., a link is newly created in a hard chain manner of a Linux system to point to the source file block, and the operation of the newly created link is basically at millisecond level, so that the copy efficiency can be significantly improved. Further, the information of the hard link block may also be transmitted to the name node.
In step S240, when the source data node of the data node where the source file block is located and the destination data node of the data node where the destination file block is located do not belong to the same data node, that is, the source file block belongs to data node 1 and the destination file block belongs to data node 2, the source file block is copied from data node 1 to the destination file block of data node 2 by data copy. The data copying method needs to write the data of the source data block into other data nodes, and actual data reading and writing occur. The way in which the data is copied is consistent with the way in which distcp and cp are copied.
In the example embodiment, since the data is continuously and iteratively updated and produced based on the original version, even when the data is produced in the new version, the data is a part of the file which is directly covered, and the update operation of the file block does not occur. When a file is copied by using a native cp command or a distcp command, a data version is often copied for several hours, and the use space is doubled additionally. After the technical scheme of the embodiment of the invention is used for copying the data, the iteration cycle of the version of the data is greatly shortened, the copying of one data version can be completed within minutes or even seconds, and meanwhile, the actual occupied space of the data is not increased, so that the labor time cost can be obviously saved, and the hardware cost is reduced.
Further, in an embodiment of the present invention, there is also provided a data replication apparatus that can apply a distributed system having a plurality of data nodes. Referring to fig. 3, the data replication device 300 may include: an information acquisition unit 310, a judgment unit 320, a local copy unit 330, and a data copy unit 340. The information obtaining unit 310 is configured to obtain a location of a source data node where a source file block is located and a location of a destination data node where a destination file block is located; the judging unit 320 is configured to judge whether the location of the source data node and the location of the destination data node belong to the same data node; the local replication unit 330 is configured to, when determining that the source file blocks belong to the same data node, replicate the source file block in a hard-chain manner; the data copying unit 340 is configured to copy the source file block to the destination file block in a data copying manner when it is determined that the source file block does not belong to the same data node.
In some embodiments of the present invention, based on the foregoing solution, the data replication apparatus 300 further includes: a determining unit, configured to determine, when a data update request for the source file is received, a file block to be updated from a name node based on the data update request; the hard chain link judging unit is used for judging whether the file block to be updated has a hard chain link; the first updating unit is used for creating a temporary file block to copy the content of the source file block and performing data updating operation on the temporary file block when the existence of the hard chain link is judged; and the second updating unit is used for directly carrying out data updating operation on the file block to be updated when the hard chain link does not exist.
In some embodiments of the present invention, based on the foregoing solution, the data replication apparatus 300 further includes: a source file block information obtaining unit, configured to traverse a directory of source files in the name node, and obtain all source file block information of the source files; a destination file block creating unit, configured to obtain the location of the source data node from the source file block information, and create a destination file block based on the source file block information and the location of the source data node; and the replication task generating unit is used for generating a replication task based on the source file block, the position of the source data node and the destination file block.
In some embodiments of the present invention, based on the foregoing solution, the data replication apparatus 300 further includes: a position acquisition unit, configured to acquire, from the replication task, a position of the source data node where the source file block is located; and the node determining unit is used for determining a destination data node where the destination file block is located based on the position of the source data node.
In some embodiments of the present invention, based on the foregoing scheme, the node determining unit is configured to: and determining the position of the source data node as a destination data node of the destination file block.
In some embodiments of the present invention, based on the foregoing solution, the position acquisition unit is configured to: and obtaining the position of the source data node where the source file is located from the replication task in a multithreading mode.
In some embodiments of the present invention, based on the foregoing scheme, the local replication unit 330 is configured to: and creating a link pointing to the source file block by the source data node to the destination file block.
Since each functional module of the data copying apparatus 300 according to the exemplary embodiment of the present invention corresponds to the steps of the exemplary embodiment of the data copying method, it is not described herein again.
In an exemplary embodiment of the present invention, there is also provided an electronic device capable of implementing the above method.
Referring now to FIG. 4, a block diagram of a computer system 400 suitable for use with the electronic device implementing an embodiment of the invention is shown. The computer system 400 of the electronic device shown in fig. 4 is only an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present invention.
As shown in fig. 4, the computer system 400 includes a Central Processing Unit (CPU)401 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for system operation are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.
In particular, according to an embodiment of the present invention, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 409, and/or installed from the removable medium 411. The above-described functions defined in the system of the present application are executed when the computer program is executed by a Central Processing Unit (CPU) 401.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present invention may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the data copying method as described in the above embodiments.
For example, the electronic device may implement the following as shown in fig. 1: step S110, acquiring the position of a source data node where a source file block is located and the position of a destination data node where a destination file block is located; step S120, judging whether the position of the source data node and the position of the destination data node belong to the same data node; step S130, when the source file blocks belong to the same data node, copying the source file blocks in a hard chain mode; step S140, when it is determined that the source file block does not belong to the same data node, copying the source file block to the destination file block in a data copy manner.
It should be noted that although in the above detailed description several modules or units of a device or apparatus for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiment of the present invention.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (10)

1. A data replication method applied to a distributed system with a plurality of data nodes is characterized by comprising the following steps:
acquiring the position of a source data node where a source file block is located and the position of a destination data node where a destination file block is located;
judging whether the position of the source data node and the position of the destination data node belong to the same data node;
when the source file blocks belong to the same data node, copying the source file blocks in a hard chain link mode;
and when the source file block does not belong to the same data node, copying the source file block to the destination file block in a data copying mode.
2. The data replication method of claim 1, further comprising:
when a data updating request for a source file is received, determining a file block to be updated from a name node based on the data updating request;
judging whether the file block to be updated has a hard chain link or not;
when the existence of the hard chain link is judged, creating a temporary file block to copy the content of the source file block, and performing data updating operation on the temporary file block;
and when judging that no hard chain link exists, directly carrying out data updating operation on the file block to be updated.
3. The data replication method of claim 1, further comprising:
traversing the directory of the source file in the name node to acquire all source file block information of the source file;
acquiring the position of the source data node from the source file block information, and creating a destination file block based on the source file block information and the position of the source data node;
and generating a replication task based on the source file block, the position of the source data node and the destination file block.
4. The data replication method of claim 3, further comprising:
acquiring the position of the source data node where the source file block is located from the replication task;
and determining a destination data node where the destination file block is located based on the position of the source data node.
5. The data replication method of claim 4, wherein determining the destination data node where the destination file block is located based on the location of the source data node comprises:
and determining the position of the source data node as a destination data node of the destination file block.
6. The data replication method of claim 4, wherein obtaining the location of the source data node at which the source file chunk is located from the replication task comprises:
and obtaining the position of the source data node where the source file is located from the replication task in a multithreading mode.
7. The data replication method of any one of claims 1 to 6, wherein the replicating the source file block by means of hard chain linking comprises:
and creating a link pointing to the source file block by the source data node to the destination file block.
8. A data replication apparatus applied to a distributed system having a plurality of data nodes, comprising:
the information acquisition unit is used for acquiring the position of a source data node where the source file block is located and the position of a destination data node where the destination file block is located;
the judging unit is used for judging whether the position of the source data node and the position of the destination data node belong to the same data node;
the local copying unit is used for copying the source file block in a hard chain link mode when judging that the source file blocks belong to the same data node;
and the data copying unit is used for copying the source file block to the destination file block in a data copying mode when the source file block is judged not to belong to the same data node.
9. An electronic device, comprising: a processor; and a memory having computer readable instructions stored thereon which, when executed by the processor, implement the data replication method of any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the data replication method according to any one of claims 1 to 7.
CN201811387721.7A 2018-11-21 2018-11-21 Data copying method and device, electronic equipment and storage medium Active CN109614383B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811387721.7A CN109614383B (en) 2018-11-21 2018-11-21 Data copying method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811387721.7A CN109614383B (en) 2018-11-21 2018-11-21 Data copying method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109614383A CN109614383A (en) 2019-04-12
CN109614383B true CN109614383B (en) 2021-01-15

Family

ID=66004675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811387721.7A Active CN109614383B (en) 2018-11-21 2018-11-21 Data copying method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109614383B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988697A (en) * 2021-05-11 2021-06-18 北京华云安信息技术有限公司 Target file copying method, device, equipment and computer readable storage medium
CN115688187B (en) * 2023-01-04 2023-03-21 中科方德软件有限公司 Method, device and equipment for safety management of hard link data and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761162A (en) * 2014-01-11 2014-04-30 深圳清华大学研究院 Data backup method of distributed file system
CN107239480A (en) * 2016-03-28 2017-10-10 阿里巴巴集团控股有限公司 The method and apparatus that renaming operation is performed for distributed file system
CN108845892A (en) * 2018-04-19 2018-11-20 北京百度网讯科技有限公司 Data processing method, device, equipment and the computer storage medium of distributed data base

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2516532C (en) * 2003-02-21 2011-09-20 Nippon Telegraph And Telephone Corporation Device and method for correcting a path trouble in a communication network
CN102170440B (en) * 2011-03-24 2013-12-04 北京大学 Method suitable for safely migrating data between storage clouds
CN103685368B (en) * 2012-09-10 2017-04-12 中国电信股份有限公司 method and system for migrating data
JP5895099B2 (en) * 2012-10-11 2016-03-30 株式会社日立製作所 Destination file server and file system migration method
US8930311B1 (en) * 2012-12-14 2015-01-06 Netapp, Inc. Push-based piggyback system for source-driven logical replication in a storage environment
US9933956B2 (en) * 2013-09-05 2018-04-03 Nutanix, Inc. Systems and methods for implementing stretch clusters in a virtualization environment
CN108268542A (en) * 2016-12-31 2018-07-10 中国移动通信集团河北有限公司 For the method and system of data-base cluster Data Migration

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761162A (en) * 2014-01-11 2014-04-30 深圳清华大学研究院 Data backup method of distributed file system
CN107239480A (en) * 2016-03-28 2017-10-10 阿里巴巴集团控股有限公司 The method and apparatus that renaming operation is performed for distributed file system
CN108845892A (en) * 2018-04-19 2018-11-20 北京百度网讯科技有限公司 Data processing method, device, equipment and the computer storage medium of distributed data base

Also Published As

Publication number Publication date
CN109614383A (en) 2019-04-12

Similar Documents

Publication Publication Date Title
CN108845816B (en) Application program updating method, system, computer device and storage medium
WO2020119485A1 (en) Page display method and device, apparatus, and storage medium
CN107870728B (en) Method and apparatus for moving data
US10248551B2 (en) Selective object testing in a client-server environment
US20170052884A1 (en) Generic test automation for restful web services applications
US10997247B1 (en) Snapshot tracking using a graph database
CN109614383B (en) Data copying method and device, electronic equipment and storage medium
CN112965945A (en) Data storage method and device, electronic equipment and computer readable medium
CN112395253A (en) Index file generation method, terminal device, electronic device and medium
CN112597126A (en) Data migration method and device
CN114817146A (en) Method and device for processing data
CN111107133A (en) Generation method of difference packet, data updating method, device and storage medium
CN114153473A (en) Module integration method, device, storage medium and electronic equipment
CN111367500A (en) Data processing method and device
CN115167822A (en) Branch code merging method, device, equipment and storage medium
CN113722007B (en) Configuration method, device and system of VPN branch equipment
CN115695416A (en) File downloading method, device, medium and equipment based on cloud storage service
CN113127430B (en) Mirror image information processing method, mirror image information processing device, computer readable medium and electronic equipment
CN111428453B (en) Processing method, device and system in annotation synchronization process
CN110727889A (en) Static webpage resource loading method, device, medium and electronic equipment
US9880904B2 (en) Supporting multiple backup applications using a single change tracker
CN110377326B (en) Installation package generation method, installation package generation device, development device and computer readable medium
CN117112500B (en) Resource management method, device, equipment and storage medium
CN113760860B (en) Data reading method and device
US11379147B2 (en) Method, device, and computer program product for managing storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant