CN1276349C - Method for mirror backup of cluster platform cross parallel system - Google Patents

Method for mirror backup of cluster platform cross parallel system Download PDF

Info

Publication number
CN1276349C
CN1276349C CN 03148518 CN03148518A CN1276349C CN 1276349 C CN1276349 C CN 1276349C CN 03148518 CN03148518 CN 03148518 CN 03148518 A CN03148518 A CN 03148518A CN 1276349 C CN1276349 C CN 1276349C
Authority
CN
China
Prior art keywords
backup
node
file
address
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CN 03148518
Other languages
Chinese (zh)
Other versions
CN1567198A (en
Inventor
刘晓光
赵玉萍
周隆跃
李电森
柳书广
肖利民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN 03148518 priority Critical patent/CN1276349C/en
Publication of CN1567198A publication Critical patent/CN1567198A/en
Application granted granted Critical
Publication of CN1276349C publication Critical patent/CN1276349C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Hardware Redundancy (AREA)

Abstract

The present invention discloses a method for mirror image backup of a cluster platform cross parallel system, which at least comprises steps: firstly starting up a plurality of standby node machine after the initializing set, transmitting timing backup commands, then judging whether the set time reaches or not, if true, carrying out the next step to generate remote backup files, else, continuously waiting,. A plurality of node machines are actuated from a remote location, and then, the node machines automatically carry out the system backup according to the program in a system backup script of a start-up mirror image. The system programs are completely copied to the remote node machine in parallel. After the backup, the node machine is started again and automatically returns the system before the backup. The present invention efficiently solves the problems of the backup of the parallel mirror images in large-scale clusters. In the backup operation, a plurality of hard disks of the node machines simultaneously carry out the system-level backup, which can compress a large amount of data of the hard disks according to a designated compression method.

Description

The method of the cross-platform parallel system mirror back-up of an a kind of group of planes
Technical field
The present invention relates to the computer system management technology, particularly relate to the method for in the cross-platform parallel system that constitutes by computer group, carrying out mirror back-up.
Background technology
A group of planes be one group separate, by the set of (being called node again) of the interconnected computing machine of express network, Network of Workstation is managed a group of planes with the pattern of triangular web, promptly make full use of the resource of each computing machine in the group of planes, realize the parallel processing system (PPS) of complex calculation again.
The backup of Network of Workstation is an important topic of cluster management system always.The difference that it and unicomputer node (hereinafter to be referred as node) back up has: the backup of (1) Network of Workstation is the backup of many nodes, and it will back up the data content of a plurality of nodes.(2) group of planes backup is the network remote backup, backs up by network.(3) group of planes backup request is concurrent.Being that a plurality of nodes back up simultaneously, just looks like that a node is backed up equally.
The backup of Network of Workstation can be divided into file-level backup and system-level backup simply.The file-level backup comprises general file or catalogue backup.System-level backup comprises file system backup, fdisk backup, hard disk backup.Wherein hard disk backup also can be described as " hard disk mirror-image ", " hard disk image " or " hard disc cloning ".Real-time degree by data backup also can be divided into backed up in synchronization and async backup.The employed technology of backed up in synchronization generally has RAID (redundant array of inexpensive disks), high available (HA) etc.The synchro system backup request carries out real-time backup fully to the whole software system in system's operation, the disaster-tolerant backup technology just belongs to this category, and the major company that only possesses very strong technical strength at present just has ripe disaster-tolerant backup technology.
Async backup to the backup real-time degree less demanding, can postpone a period of time after, the data content of goal systems before a period of time backed up, system is temporarily stopped system is backed up.
At present the method that a group of planes is carried out the asynchronous system backup mainly also is based on " online " backup mode that the client node backups to the backup server node.Be under network UNICOM state, to carry out network copy between the node computer.This backup method is more effective to the file-level backup, almost is invalid to system-level mirror back-up still.Because system-level backup needs the content of saved system boot section, common copy command is the dubbing system boot files correctly; And system's some system file that is in operation is a disable access.So system-level mirror back-up need be taked special method.For example with the mirror back-up that carries out " off-line " (with respect to " online ", not being real off-line) mode after the system start-up dish restarting systems, its typical case's representative is Ghost.Its shortcoming also is conspicuous, and it must use external unit (flexible plastic disc) to generate a system start-up dish earlier.
Now, support at great majority all to be equipped with PXE (Pre-Execution Environment, pre-service environment) in the mainboard BIOS of scsi device, simplified the DynamicHost address allocation procedure.And linux system itself is a microkernel designs, can build one with very little kernel and a spot of support file and simplify operating system, on this operating system basis, utilize system's commonly used command just can finish than complicated operations, as compression, far call, backup and recovery.
If can design a kind of like this " off-line " redundancy technique, make the backup of system can utilize the strong point of new hardware technology and Linux software well, both can be used for system-level mirror back-up, the backup of support timing automatic, support more than one compress mode, embody concurrent technique, do not need human intervention again; It will be the contribution greatly that prior art is made.
Summary of the invention
The technical problem to be solved in the present invention is the method that proposes the cross-platform parallel system mirror back-up of an a kind of group of planes, use the present invention can reduce backup and rejuvenation complicated operation degree greatly, walk abreast and carry out system backup, improve the speed of backup, thereby increase work efficiency.
The method of the cross-platform parallel system mirror back-up of a group of planes of the present invention mainly comprises the steps:
One, after initialization is provided with, starts a plurality of node computers to be backed up, send the timed backup order;
Two, judge whether timing arrives,, then continue to wait for,, carry out next step if timing arrives if do not arrive;
Three, generate the startup mirror image that remote backup is used, long-rangely restart a plurality of node computers, comprise the system backup script in this startup mirror image;
Four, these node computers carry out system backup according to the Automatic Program in the system backup script that starts in the mirror image, and system program is intactly walked abreast to be copied in the long-range node computer;
Five, after backup was finished, node computer was restarted, and automatically returned to the preceding system of backup.
The present invention has realized the parallel system backup to extensive group of planes node, has both utilized the advantage of PXE, has utilized the characteristics of linux system again, and advantage and innovative point that it is main have:
It efficiently solves in the extensive group of planes difficult problem of parallel mirror back-up, in backup operation, the hard disk of a plurality of node computers has been carried out system-level backup simultaneously.
To may being that the hard disc data of magnanimity compresses according to the compress mode of appointment.
What no matter adorn in the hard disk is linux system, or the Windows system, after backup, recovers again, and system can return to the state before the backup, all operate as normal.
Added the timed backup function,, carried out the concurrent system backup of many nodes automatically, just can finish the system backup of a plurality of nodes fully without manual intervention in the time (as weekend or festivals or holidays) of appointment.
Description of drawings
Fig. 1 is the general flow chart of the method for the invention;
Fig. 2 is the process flow diagram that carries out remote backup in the method for the invention;
Fig. 3 is the process flow diagram that the backup postjunction is restarted in the method for the invention.
Embodiment
For the system manager, backup is the problem that often needs consideration, in the face of a large amount of servers, if adopt Ghost or other conventional methods to back up, workload is very big, and node backs up one by one, and work efficiency is very low, and backup and recovery operation is very complicated.
The present invention mainly solves the problem of the parallel system backup of many nodes in the large-scale group of planes, cross-platform backup, automated back-up and compress backup.
As shown in Figure 1, control desk to a plurality of nodes parallel send the order of system backup after, a plurality of node computers almost generate simultaneously the startup mirror image of each node computer at fixed time automatically, start in the mirror image and comprise the system backup shell script, then these node computers almost restart simultaneously at once, enter a (SuSE) Linux OS environment of simplifying, and can carry out telecommunication with backup server, these nodes carry out system backup by the Automatic Program in the system backup script then, system complete walk abreast and copy to long-range backup server, can also be when duplicating by the compress mode compress backup file of appointment, after backup is finished, these nodes automatically return to the system before the backup, and whole process need not human intervention.
When hard disk generation physical damage or fdisk software fault occurs and cause system normally to start, utilize the present invention to recover, can recover original system apace, and need not adorn operating system again, then adorn various application software systems again and carry out complicated configuration at last.
Fig. 2 has provided the specific implementation process of carrying out remote backup in the method for the invention.
At first, restart node 1-N, after System self-test finishes, the CMOS that enters each node is provided with the interface, adjust starting up's order (Boot Sequence) of node 1-N, before being placed on every other equipment, guarantee that " PXE " is first starting outfit from " PXE " startup.Then, download under the first catalogue of TFTP (TFTP) that the pxelinux.0 file is placed on backup server.
Then, the related service of backup server is set, comprises:
Open node 1-N regularly exectorial backstage service.For the Linux node, start timing services if desired, only need operation "/etc/rc.d/init.d/crond restart " to get final product.If desired once the run timing service of starting shooting, on order line, move "/sbin/chkconfig crond on " with power user's identity and get final product.For the node of Windows 2000 Server systems, if the permission program is at the appointed time moved, operation steps is: start " Task Scheduler " service in " management tool " " service " lining.
Open RSH (remote terminal) service of backup server, and RSH configuration file/root/.rhosts is put in the IP address of node 1-N, allow node 1-N to visit backup server by RSH.
Open the TFTP service of backup server, allow other nodes by transmitting file under the TFTP agreement.With power user's identity on the order line on backup server the operation "/sbin/chkconfig tftpon " and "/etc/rc.d/init.d/xinetd restart " get final product.
After executing above step, just can carry out substantial manipulation according to following steps.
Control desk has the system backup order of timing for the parallel transmission of node 1-N.Node 1-N is after the system backup order that receives the timing that backup server sends separately, time and order are extracted respectively, according to the exectorial form of timing time and order are write regularly in the exectorial configuration file, and restart Crond service (the Windows node is directly carried out timer command).By the time timing one arrives, and the node 1-N while as execution parameter, by far call, operates information such as the MAC Address of this machine, IP address below carrying out on the backup server.
A shared file initrd is loaded (mount) to a temp directory, generate a temporary file system, this file system has shared startup module, communication module and indispensable backup and compress order.If temp directory is loaded, then wait for 1 second after, reattempt loading.This file initrd generates with a special script file, and the inside comprises shared module, order and library file.
Corresponding module insert order, activate network interface card and be furnished with the order of this node IP address, the order of backing up this node hard disk and compress backup by RSH is inserted in the bin/init file under the temp directory, operating system execute file init is simplified in customization, and the instruction of this document the last item is to carry out special restarting (reboot) order.
Corresponding integrated circuit board driver module is copied in certain one-level catalogue under the temp directory.
Unload the temporary file system, compress the file initrd after customizing, and the file after the compression is renamed as the startup image file of naming with node name, be placed under the TFTP head catalogue.
Press the PXE rule and generate PXE startup boot files, in boot files, will contain startup image filename content with the node name name.
Generation contains the IP address information of node 1-N, DHCP (DHCP) configuration file "/etc/dhcpd.conf " of MAC information.
All nodes of wait node 1-N are finished above step, if there is some node computer can not finish above step for some reason, forces after then waiting for a period of time down and carry out.
Judge that the Dynamic Host Configuration Protocol server whether other are arranged is moving and might clash,, then start the DHCP service of backup server if do not have.
Restart order to parallel transmission of node 1-N.
Fig. 3 is the process flow diagram that the backup postjunction is restarted in the method for the invention.
After node computer receives instruction of restarting, normal shutoff operation system also restarts, because the first startup item is PXE, node computer at first enters into the PXE environment, outwards sends the message that contains Network Interface Unit MAC information, and Dynamic Host Configuration Protocol server (backup server) is after receiving this mac address information, check the item that in the DHCP configuration file, whether mates, if have, then find the IP address of this MAC coupling, return IP address, startup file and other information of this machine to node computer; Node computer is then done a mapping with this address, in the first directory search corresponding file of Dynamic Host Configuration Protocol server; After finding file, serve the startup image file of searching this MAC Address correspondence in the first catalogue at TFTP again.
After finding the startup image file, Dynamic Host Configuration Protocol server passes operating system nucleus down and starts image file to node computer by TFTP.Node computer begins the import operation system kernel, sets up a simple and easy file system in internal memory, and afterwards, operating system is at first carried out the init file, and after having loaded necessary module, beginning is according to the shell script order fill order that defines in the image file.
Shell script at first to network interface configuration original IP address and activation, is set up local route, and after souning out several seconds, operating system is set up the routing table of external communication; Then, begin to carry out backup tasks: read the local hard drive data with the system copies order earlier, set up remote procedure call with the RSH order then, set up a data channel (also claiming pipeline) that arrives backup server by far call, data are compressed before by this passage, the data of crossing in the terminal store compressed of passage.
After all data have all read, send information to Remote Node RN, the reporting system backup finishes, and contents such as the MAC of this node computer, IP address restart the DHCP service in the deletion Dynamic Host Configuration Protocol server.Then shell script is carried out and is passed through transformed (reboot) order of restarting.
When node is restarted, though in PXE, will continue to seek dynamic IP addressing, owing to do not have corresponding D HCP configuration item, so this node will can not get the dynamic address of this machine.General PXE is provided with overtime control, and the overtime time limit one arrives, and node computer will normally carry out hard disk startup, read the BOOT information on the hard disk, enter into normal operating system then.
Rejuvenation and the backup procedure of introducing above are very similar, except rejuvenation will be used corresponding recovery order and parameter, also mainly contain with the backup procedure difference:
The parameter of backup command includes the hard disk or the zone name that will back up, and is corresponding image filename (or magnetic tape station implementor name) and recover the order relevant parameters;
If compression has been used in backup, then recover to use corresponding decompression order;
Recovering step proceeds to restarts after the node, enter into the moment that begins to carry out recovery tasks, set up remote procedure call by the RSH order earlier, and set up a data channel (pipeline) that arrives backup server, the beginning part of passage reads teledata with the system copies order, decompressed when data are passed through this passage, at the end of passage the data that decompress are write on local hard drive or the subregion.
In a word, the present invention greatly facilitates many nodes system backup in the group of planes, can intactly back up several operation systems, that is to say and can carry out mirror back-up completely a plurality of fdisks or DISK to Image, and no matter what operating system what install on fdisk and the hard disk is.The present invention is a kind of cross-platform in the group of planes, asynchronous system, system-level image backup method that is suitable for.
It should be noted last that: above embodiment is the unrestricted technical scheme of the present invention in order to explanation only, although the present invention is had been described in detail with reference to the foregoing description, those of ordinary skill in the art is to be understood that: still can make amendment or be equal to replacement the present invention, and not breaking away from any modification or partial replacement of the spirit and scope of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.

Claims (6)

1, the method for the cross-platform parallel system mirror back-up of an a kind of group of planes is characterized in that, comprises the steps:
After step 1, initialization are provided with, start a plurality of node computers to be backed up, send the timed backup order;
Step 2, judge whether timing arrives,, then continue to wait for,, carry out next step if timing arrives if do not arrive;
Step 3, generate the startup mirror image that remote backup is used, long-rangely restart a plurality of node computers, comprise the system backup script in this startup mirror image;
Step 4, these node computers carry out system backup according to the Automatic Program in the system backup script that starts in the mirror image, and system program is intactly walked abreast to be copied in the long-range node computer;
After step 5, backup were finished, node computer was restarted, and automatically returned to the preceding system of backup.
2, the method for the cross-platform parallel system mirror back-up of a group of planes according to claim 1 is characterized in that, described step 3 further comprises:
Open node 1-N regularly exectorial backstage service;
Open the remote terminal service of backup server, and the remote terminal configuration file is put in the IP address of node 1-N, allow node 1-N to visit backup server by remote terminal;
Open the TFTP service of backup server, allow other nodes by transmitting file under the TFTP.
3, the method for the cross-platform parallel system mirror back-up of a group of planes according to claim 2, it is characterized in that, control desk has the system backup order of timing for the parallel transmission of node 1-N, node 1-N is after the system backup order that receives the timing that backup server sends separately, time and order are extracted respectively, according to the exectorial form of timing time and order are write regularly in the exectorial configuration file, and restart related service.
4, the method for the cross-platform parallel system mirror back-up of a group of planes according to claim 3 is characterized in that, by far call, and operation below carrying out on the backup server:
A shared file initrd is loaded into a temp directory, generates a temporary file system;
Corresponding module insert order, activate network interface card and be furnished with the order of this node IP address, the order of backing up this node hard disk and compress backup by RSH is inserted in bin/init file temp directory under, customize and simplify operating system execute file init;
Corresponding integrated circuit board driver module is copied in certain one-level catalogue under the temp directory;
Unload the temporary file system, compress the file initrd after customizing, and the file after the compression is renamed as the startup image file of naming with node name, be placed under the TFTP head catalogue;
Generate the pre-service environment by the pre-service environmental planning and start boot files, in boot files, will contain startup image filename content with the node name name;
Generation contains the IP address information of node 1-N, the DHCP configuration file of MAC information;
All nodes of wait node 1-N are finished above step, judge that the Dynamic Host Configuration Protocol server whether other are arranged is moving and might clash, if do not have, then starts the DHCP service of backup server;
Restart order to parallel transmission of node 1-N.
5, the method of the cross-platform parallel system mirror back-up of a group of planes according to claim 4, it is characterized in that, after node computer receives instruction of restarting, normal shutoff operation system also restarts, node computer at first enters into the pre-service environment, outwards send the message that contains Network Interface Unit MAC information, as the Dynamic Host Configuration Protocol server of backup server after receiving this mac address information, check the item that in the DHCP configuration file, whether mates, if have, then find the IP address of this MAC coupling, return the IP address of this machine to node computer, startup file and other information; Node computer is then done a mapping with this address, in the first directory search corresponding file of Dynamic Host Configuration Protocol server; After finding file, in the first catalogue of TFTP service, search the startup image file of this MAC Address correspondence and carry out corresponding operating again.
6, the method for the cross-platform parallel system mirror back-up of a group of planes according to claim 5 is characterized in that, Dynamic Host Configuration Protocol server passes operating system nucleus down and starts image file to node computer by TFTP; Shell script at first to network interface configuration original IP address and activation, is set up local route; Then, begin to carry out backup tasks;
After all data have all read, send information to Remote Node RN, the reporting system backup finishes, and contents such as the MAC of this node computer, IP address restart the DHCP service in the deletion Dynamic Host Configuration Protocol server;
Last shell script is carried out the reset command through transforming.
CN 03148518 2003-06-30 2003-06-30 Method for mirror backup of cluster platform cross parallel system Expired - Lifetime CN1276349C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 03148518 CN1276349C (en) 2003-06-30 2003-06-30 Method for mirror backup of cluster platform cross parallel system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 03148518 CN1276349C (en) 2003-06-30 2003-06-30 Method for mirror backup of cluster platform cross parallel system

Publications (2)

Publication Number Publication Date
CN1567198A CN1567198A (en) 2005-01-19
CN1276349C true CN1276349C (en) 2006-09-20

Family

ID=34472299

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 03148518 Expired - Lifetime CN1276349C (en) 2003-06-30 2003-06-30 Method for mirror backup of cluster platform cross parallel system

Country Status (1)

Country Link
CN (1) CN1276349C (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102457541A (en) * 2010-10-25 2012-05-16 鸿富锦精密工业(深圳)有限公司 System and method for avoiding resource competition during starting diskless workstation
CN102591750A (en) * 2011-12-31 2012-07-18 曙光信息产业股份有限公司 Recovery method of cluster system
CN102664922A (en) * 2012-03-30 2012-09-12 浪潮电子信息产业股份有限公司 High-speed network starting method based on Linux system
CN102707968A (en) * 2012-04-12 2012-10-03 华平信息技术股份有限公司 Method and system for generating installation backup system
CN104407942A (en) * 2014-11-28 2015-03-11 上海爱数软件有限公司 Off-site storage based Linux operation system backup recovery method
CN106487524B (en) * 2015-08-27 2019-09-13 昆达电脑科技(昆山)有限公司 The method of remote opening
CN106326051A (en) * 2016-08-22 2017-01-11 浪潮电子信息产业股份有限公司 Method for realizing automatic switchover of OS (Operating System) in PXE (Pre-boot Execution Environment) testing environment
CN108804253B (en) * 2017-05-02 2021-08-06 中国科学院高能物理研究所 Parallel operation backup method for mass data backup
CN114079616B (en) * 2021-11-02 2023-11-03 中国船舶重工集团公司第七0三研究所 Redundancy method for database of non-hot standby disk array server

Also Published As

Publication number Publication date
CN1567198A (en) 2005-01-19

Similar Documents

Publication Publication Date Title
TWI547875B (en) Converting machines to virtual machines
US7475282B2 (en) System and method for rapid restoration of server from back up
US7353355B1 (en) System and method for rapid restoration of server from backup
CN111338854B (en) Kubernetes cluster-based method and system for quickly recovering data
US7937612B1 (en) System and method for on-the-fly migration of server from backup
US20180121186A1 (en) Software installation onto a client using existing resources
US7725559B2 (en) Virtual data center that allocates and manages system resources across multiple nodes
US7281104B1 (en) System and method for online data migration
US9547562B1 (en) Boot restore system for rapidly restoring virtual machine backups
CN101408856A (en) System and method for tolerance disaster backup(disaster-tolerant backup)
US8612553B2 (en) Method and system for dynamically purposing a computing device
US20070067366A1 (en) Scalable partition memory mapping system
JP2004013563A (en) Computer system, user data storage device, data transfer method for storage device, backup method for user data and its program
WO2002091179A2 (en) Method and apparatus for migration of managed application state for a java based application
CN1276349C (en) Method for mirror backup of cluster platform cross parallel system
CN111381933A (en) Docker thermal migration implementation method
US7506115B2 (en) Incremental provisioning of software
CN116383167A (en) Method for solving insufficient disk space based on object storage
CA2555483A1 (en) A method for providing live file transfer between machines
KR100947136B1 (en) Incremental provisioning of software

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CX01 Expiry of patent term

Granted publication date: 20060920

CX01 Expiry of patent term