CN100589081C

CN100589081C - UNIX environment system restoration method

Info

Publication number: CN100589081C
Application number: CN200710037953A
Authority: CN
Inventors: 辛旻; 王磊; 陈晓武; 许能飞
Original assignee: Shanghai Baosight Software Co Ltd
Current assignee: Shanghai Baosight Software Co Ltd
Priority date: 2007-03-09
Filing date: 2007-03-09
Publication date: 2010-02-10
Anticipated expiration: 2027-03-09
Also published as: CN101261595A

Abstract

The invention discloses a system recovery method under the UNIX environment, which can still ensure the normal operation of a system under the situations of disk array faults or collapse of database,lost or damage of files which are stored in the disk array, thus realizing the purpose of reducing the impacts on the system which are caused by the faults above and ensuring the effective operation rate of the system. The method mainly comprises three steps of synchronization, switch and recovery; wherein, the synchronization is the step of daily automatic timing completion of the reproduction ofthe disk array to a local disk; the switch is the step of switching the UNIX system from the disk array to the local disk for continuous operation when the disk array is in fault; and the recovery isthe step of re-switching the UNIX system from the local disk to the disk array for operation and recovering to the normal state after the disk array is normal by recovery.

Description

System recovery method under the unix environment

Technical field

The present invention relates to the restoration methods when system breaks down under a kind of unix environment.

Background technology

Along with the capacity of disk array is increasing; virtual store, storage are shared becomes possibility on the one hand; make a disk array to be shared by a plurality of even more system; it will be more and more serious also having caused on the one hand the disk array consequence that is caused that in a single day breaks down; and disk array is shut down maintenance coordinated time possibility hardly; even, still can guarantee the method for the normal operation of unix system so need find a kind of disk array to go wrong.

Though the reliability of disk array and database is strengthening at present, in a single day out of joint be not minor issue just.Disk array generally all guarantees its reliability by two disk arrays at present, significantly improve but such result who causes is a cost, and performance descends to some extent; Database then generally adopts the cluster structure to improve its reliability, but this need depend on cluster software and disk array.Even possessed top two conditions simultaneously, still be difficult to stop the damage of program file or lose, perhaps fault such as database collapse takes place.

Under the present circumstances; when taking place, the disk array fault can only shut down maintenance; and can only shut down reparation for database collapse, for the damage of file or lose and can only shut down progressively the location or all recover by force, this normal operation to important system causes immeasurable loss.Especially because the appearance of package software, make a file system may deposit hundreds of thousands even individual file up to a million, and also just often be difficult to the location when going wrong, very long again release time from tape, need find the method for a quick recovery system operation, give the location problem, recover to race against time fully.

Summary of the invention

Technical matters to be solved by this invention provides the system recovery method under a kind of unix environment, the disk array fault takes place or be stored in database collapse in the disk array, file is lost or situation about damaging under, still can guarantee the normal operation of system, thereby can realize reducing the purpose of above-mentioned fault, guarantee effective operation ratio of system the influence that system caused.

For solving the problems of the technologies described above, the invention provides the system recovery method under a kind of unix environment, this method mainly comprises synchronously, switches and recovers three steps; Wherein, described synchronously for finish the copy step of disk array at daily self-timing to this domain; Described switching is meant when disk array breaks down, and unix system is switched to the step that operation is continued in this domain by disk array; Described recovery is meant and after disk array recovers normally unix system is changed the disk array operation again by this domain, returns to the step of normal condition;

Described synchronously by setting up good synchronization job self-timing initiation in advance, it further may further comprise the steps:

(1) snapshot current file system on disk array;

(2) snapshot with step (1) gained is loaded as the snapshot document system, to be placed in the backup file system in this domain after the packing of described snapshot document system then, described this domain is the disk that is built in the computer that described unix system moves, on described this domain, establish the backup file system and with the corresponding file system of disk array;

(3) file system on the disk array is removed snapshot;

(4) literature kit that will leave in this domain backup file system unpacks, and covers on this domain and the corresponding file system of disk array.

The present invention has such beneficial effect owing to adopted technique scheme, promptly by built-in this domain with enough capacity in computer, periodically the content on this domain and the disk array is carried out automatically synchronously at daily use snapping technique; Disk array fault, database collapse take place, file is lost or during situation such as damage, system is switched to this domain by disk array continues operation; And after fault is got rid of, system is returned under the normal condition of moving on the disk array again; Thereby guaranteed that system still can normally move when producing significant trouble, striven for the time for fixing a breakdown, made described fault can be controlled in the minimum scope, effectively guaranteed effective operation ratio of system, controlled cost effectively the influence of system.

Description of drawings

The present invention is further detailed explanation below in conjunction with accompanying drawing and embodiment:

Fig. 1 is the schematic flow sheet of the method for the invention;

Fig. 2 is the synoptic diagram that carries out according to the present invention when synchronous;

Fig. 3 is the synoptic diagram when switching according to the present invention;

Fig. 4 is the synoptic diagram when recovering according to the present invention.

Embodiment

Be illustrated in figure 1 as the schematic flow sheet of the system recovery method under the unix environment of the present invention, mainly comprise synchronously, switch and recover three steps; Wherein, specifically be meant synchronously: finish between disk array and this domain regularly synchronous step automatically daily, promptly finish disk array automatically, use when switching to the duplicating of this domain; Switch and specifically be meant: when disk array breaks down, unix system is switched to the step that operation is continued in this domain by disk array; Recover specifically to be meant: after disk array recovers normally, unix system is changed the disk array operation again by this domain, return to the step of normal condition.In the present invention, these three steps are to complement each other, and are indispensable.

In the present invention, described unix system should be supported file system snapshot; Described this domain is the disk that is built in the computer that described unix system moves, and in order to ensure having enough capacity, suggestion uses two blocks of built-in disk mirror images as this domain.In order in described this domain, to deposit synchronizing content, should set up in advance on this this domain and the corresponding file system of disk array, be referred to as local file system in the present invention.In another embodiment, also should set up the backup file system in advance on described this domain, synchronously the time, depositing the file of packing temporarily, thereby can shorten lock in time, guarantee synchronous success; Described backup file system also is used to system to preserve a compress backup, guaranteeing can to return back to the front any one day, and can extract arbitrary file.In order to deposit described compress backup, this backup file system should guarantee that enough spaces are arranged; Disk array should be guaranteed to have the data snapshot of depositing between sync period in enough spaces and be changed, and described data snapshot variation is meant that system occurs in the variation in the disk array.In the present invention, also should use the system supervisor (crontab) of management timing operation in the unix system in system, to set up synchronization job and sync packet removal treatment in advance, wherein synchronization job is used for the daily synchronous working of timing automatic initiation enforcement, the bag that the sync packet removal treatment stays before being used for regularly initiating automatically to implement to clear up by retention strategy.For database, also should use crontab to set up the operation of database output journal in advance, thereby guarantee to spue daily record every certain interval time, make and lose the data that are no more than interval time when bust takes place, described interval time, I was made as 1 minute, but be made as generally speaking 5 minutes, the time of recovering with assurance can too much not prolong because journal file quantity.When guaranteeing that disk array damages, this domain can be according to the daily record restore data, and database should be made as dual logging, 1 part leaves on the disk array, 1 part leaves on this domain, wherein in the local log file system link should be set, to guarantee that switching the back daily record is same position.

As shown in Figure 2 of the present invention generally carried out when idle at system night synchronously, preferably does every day once, and consuming time shorter in the time of can guaranteeing to switch like this, synchronization times be exceeded at ordinary times.Described is to initiate by setting up good synchronization job self-timing in advance synchronously, and it further may further comprise the steps:

(1) snapshot current file system on disk array.Because unix system is an on-line system, data file constantly changes, and can not shut down again in the time of synchronously, so the present invention adopts the snapshot mode that the system file in the disk array is copied in the file system in this domain.The time of snapshot is very short, and AIX system guaranteed the consistance of backup current file, and in present tens times experiment, oracle database, tuxedo etc. all can normally open.

(2) snapshot document system packing is placed on this domain, it be the snapshot document system that the snapshot that is about to step (1) gained loads (mount), is placed in the backup file system in this domain after then described snapshot document system being packed.Wherein Da Bao purpose is depositing and compress for the ease of the snapshot document system.If load and time allow, and/or backup file system space anxiety, then also can be when packing described snapshot document system be compressed.

(3) file system on the disk array is removed snapshot, carry out this step and be constantly to produce new data, so packing needs to remove at once snapshot after finishing because moving during system synchronization.

(4) file that will deposit in the backup file system in this domain unpacks, be about to literature kit and untie and cover corresponding file system on this domain, if sync interval changes greatly, then need to unpack after the cleaning file system earlier, described cleaning work can regularly finish by setting up good sync packet removal treatment in advance again; Otherwise can directly unpack.

When following situation takes place, damage as disk array, be difficult in a short time repair; CLUSTER software handover success but system is still undesired; Suspect file destroyed but be difficult to and locate; Confirm needs to recover from tape comprehensively, but the user can not wait for; And confirm normally to carry out synchronously, when situation such as rehearsal has been added in switching before reaching the standard grade, can realize as shown in Figure 3 switching by following steps in the present invention:

(1) comments out each relevant operation of using crontab to set up in advance in the system.Because quicker in order to recover, during switchover operation, need stop synchronization job, sync packet removal treatment and the operation of database output journal.

(2) with the manual both sides CLUSTER software of cutting off of the mode of forcing (Force).Can guarantee not to be subjected under the switching state other to disturb like this, thereby avoid CLUSTER software generation auto-action.Cut off CLUSTER software in the Force mode in addition, also can realize without manual change NIC address, mount file system or the like.

(3) cut off all application that comprise database on the disk array.If at this moment system has taken place unusually, can't guarantee that each application can unload (umount) and get off, therefore can kill (kill) process in case of necessity.

(4) do not bear the same name for guaranteeing, also be convenient to different machine maintenance complex system, need lay down the file system on the disk array, stop disk array and use.

(5) because system can not have the file system of two duplications of name to load, therefore at this moment need to revise loading (mount) point of the respective file system on this domain, then these file system of mount again.

(6) restore database on this domain.Utilize the daily record of regularly implementing to spue by the database output journal operation of setting up in advance to come restore database in the present invention, because daily record of output in general 5 minutes at ordinary times realizes recovering so this goes on foot after starting script again.This should be consuming time the longest when whole switching.

(7) open other application on this domain.

(8) it is normal to confirm to run on the system in this domain.

When the fault that is taken place is confirmed to repair at different machine or development environment; Database shifts to an earlier date 1-2 hour and recovers consistent with current runtime database by modes such as heat are equipped with; Enough Scheduled Down Times are arranged greater than 2 hours; And under the situation that person skilled is shown up, can realize as shown in Figure 4 recovering step by following steps in the present invention:

(1) stops to comprise on this domain all application of database, guarantee preferably that in this step all application can both normally stop, especially database.

(2) because system can not have the file system of two duplications of name to load, therefore at this moment need local file system is returned former, promptly change back former loading (mount) point, then mount again.

(3) activate disk array, mount file system.

(4) anti-corresponding document synchronously.Here mainly be to satisfy a small amount of journal file copy protection in this domain that produces again after the recovery prerequisite to disk array.

(5) database on the recovery disk array.Because certain point before database has returned to and shut down, so the recovery daily record is limited, speed is very fast.

(6) application in the unlatching disk array.

(7) confirm that the system that runs in the disk array is normal.

(8) restart CLUSTER software.Because when switching is the CLUSTER software that stops in the Force mode, so can only start CLUSTER software itself automatically when starting CLUSTER software.

(9) enable the relevant operation of crontab, comprise synchronization job, sync packet removal treatment and the operation of database output journal.

Claims

1, the system recovery method under a kind of unix environment is characterized in that, comprising: synchronously, switch and recover three steps; Wherein, described synchronously for finish the copy step of disk array at daily self-timing to this domain; Described switching is meant when disk array breaks down, and unix system is switched to the step that operation is continued in this domain by disk array; Described recovery is meant and after disk array recovers normally unix system is changed the disk array operation again by this domain, returns to the step of normal condition;

(1) snapshot current file system on disk array;

(3) file system on the disk array is removed snapshot;

2, the system recovery method under the unix environment according to claim 1 is characterized in that, described this domain is made up of two built-in disks.

3, the system recovery method under the unix environment according to claim 1 is characterized in that, in the described snapshot document system packing to disk array in carrying out step (2), it is compressed.

4, the system recovery method under the unix environment according to claim 1 is characterized in that, in described step (4) is literature kit to be unpacked after the sync packet removal treatment self-timing that foundation is good is in advance cleared up the file system in this domain again.

5, the system recovery method under the unix environment according to claim 1 is characterized in that, described switching further comprises:

(1) comments out each relevant operation that the system supervisor that uses management timing operation in the unix system is set up in advance;

(2) cut off both sides CLUSTER software in compulsory mode;

(3) cut off all application that comprise database on the disk array;

(4) lay down the file system of disk array, stop use disk array;

(5) revise on this domain gatehead with the corresponding file system of disk array, and on described this domain of reloading with the corresponding file system of disk array;

(6) restore database on this domain;

(7) open other application on this domain;

(8) it is normal to confirm to run on the unix system in this domain.

6, the system recovery method under the unix environment according to claim 5 is characterized in that, described step (6) is to utilize the daily record of regularly implementing to spue by the operation of database output journal to come the recovery of fulfillment database.

7, the system recovery method under the unix environment according to claim 1 is characterized in that, described recovery further comprises:

(1) stops to comprise on this domain all application of database;

(2) will change back former gatehead with the corresponding file system of disk array on this domain, and on described this domain of reloading with the corresponding file system of disk array;

(3) activation is loaded the file system of disk array to the use of disk array;

(4) anti-corresponding document synchronously;

(5) database on the recovery disk array;

(6) application in the unlatching disk array;

(7) confirm that the unix system that runs in the disk array is normal;

(8) restart CLUSTER software;

(9) enable the relevant operation of using the system supervisor foundation of managing timing operation in the unix system.