CN104391764A - Computer fault-tolerant method and computer fault-tolerant system - Google Patents
Computer fault-tolerant method and computer fault-tolerant system Download PDFInfo
- Publication number
- CN104391764A CN104391764A CN201410632804.3A CN201410632804A CN104391764A CN 104391764 A CN104391764 A CN 104391764A CN 201410632804 A CN201410632804 A CN 201410632804A CN 104391764 A CN104391764 A CN 104391764A
- Authority
- CN
- China
- Prior art keywords
- host apparatus
- equipment
- virtual machine
- stand
- guest virtual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Hardware Redundancy (AREA)
Abstract
The invention provides a computer fault-tolerant method and a computer fault-tolerant system. The computer fault-tolerant method is applied to the computer fault-tolerant system comprising main equipment and standby equipment, wherein each of the main equipment and the standby equipment comprises a client virtual machine established on a virtualized core; the computer fault-tolerant method comprises the following steps of presetting trigger conditions for synchronization of virtual memories; when the trigger conditions occur, completing synchronization of the virtual memory of the main equipment and the virtual memory of the standby equipment at one time; when the main equipment has a fault, stopping operation of the client virtual machine of the main equipment and starting operation of the client virtual machine of the standby equipment. According to the technical scheme of the invention, the risk of disorderly close-down can be prevented, when the main equipment has a fault, the business is directly transferred to the standby equipment, application is continuous without intermittence, not data cannot be lost and reliability is higher; the technical scheme is independent of an operating system layer and the application is wider.
Description
Technical field
The present invention relates to a kind of computer safety system, particularly relate to a kind of MATLAB software method and system.
Background technology
Along with the fast development of Information technology, the rise of especially Internet of Things and cloud computing, computer system is increasing to socioeconomic influence on development.Computer system is not only applied in administering and maintaining of data message, and be applied in Industry Control with produce perform.As manufacturing industry Process Control System, factory's manufacturing execution system, enterprise network data center systems, computer system become industrial IT infrastructure core.The parts of computer system may break down, if the component malfunction of computer system, can affect the operation of computer system unavoidably, even cause computer system to be collapsed.Like this, how to ensure that computer system is when breaking down, computer system normally can work or recover normal work, just becomes a key issue of computer system development.
For the safety and stability problem of the computer system that computer system fault causes, there has been proposed many solutions, comprise the MSCS failover clustering scheme of Microsoft, RoseHA clustered software scheme, the virtual HA scheme of Vmware etc.But the mentality of designing of these solutions is all adopt the failover of master-slave equipment, realize main frame break down after the fast quick-recovery of client machine system.Namely, when main frame breaks down, take over from machine and restart crucial application service.But this design has obvious defect, not only between principal and subordinate, restart switching when application service and can cause operation system interrupt run, and this system often complex structure, there is very large restriction to software systems and applied environment.
Given this, how to find one can when computer system breaks down, the service operation of computer system also can ensure continuous unbroken MATLAB software method and system, just becomes those skilled in the art's problem demanding prompt solution.
Summary of the invention
The shortcoming of prior art in view of the above, the object of the present invention is to provide a kind of MATLAB software method and system, and for solving in prior art when computer system breaks down, service operation can interrupted problem.
For achieving the above object and other relevant objects, the invention provides a kind of MATLAB software method, be applied to and comprise in the MATLAB software system of host apparatus and stand-by equipment, described host apparatus and stand-by equipment all comprise the guest virtual machine be based upon on virtual kernel, and the described fault-tolerance approach synchronous based on virtual memory comprises: preset the trigger condition that virtual memory is synchronous; When described trigger condition occurs, the virtual memory completing once described host apparatus and described stand-by equipment is synchronous;
The described fault-tolerance approach synchronous based on virtual memory also comprises: when described host apparatus breaks down, stop the operation of the described guest virtual machine of described host apparatus, and start the operation of the guest virtual machine of described stand-by equipment, allow described stand-by equipment take over the work of described host apparatus.
The described virtual memory completing once described host apparatus and described stand-by equipment synchronously comprises: the operation suspending the described guest virtual machine of described host apparatus, and the memory pages content realizing the memory pages content of the guest virtual machine of described stand-by equipment and the guest virtual machine of described host apparatus is completely the same; The described guest virtual machine restarting described host apparatus runs, the described guest virtual machine of described host apparatus complete this virtual memory synchronous in the I/O comprised operate, and ensure that the data in magnetic disk of the data in magnetic disk of described host apparatus and described stand-by equipment is completely the same.
The described content of pages realizing the content of pages of the guest virtual machine of described stand-by equipment and the guest virtual machine of described host apparatus is completely the same to be comprised: determine to occur to from last time described trigger condition the virtual memory page that when current described trigger condition occurs, guest virtual machine, content of pages changes, and by the content delivery of all determined virtual memory pages to described stand-by equipment, make the content of determined virtual memory page in the content of the virtual memory page corresponding to described determined virtual memory page of guest virtual machine described in described stand-by equipment and described host apparatus consistent.
The synchronous trigger condition of described virtual memory comprises: the described virtual client of described host apparatus occurs the change of I/O state.
The present invention also provides a kind of MATLAB software system, comprises host apparatus and stand-by equipment, and described host apparatus and described stand-by equipment all comprise synchronizing software module, fault management module, guest virtual machine module; During described MATLAB software system cloud gray model, the synchronizing software module of described host apparatus and described stand-by equipment, fault management module, guest virtual machine module all operate on virtual kernel, synchronizing software process, fault management process, guest virtual machine respectively on corresponding described virtual kernel; Wherein: described guest virtual machine is for realizing the operation of application program; The described guest virtual machine of host apparatus is in running status, and the described guest virtual machine of described stand-by equipment is in synchronous operation but can not accessed supervisor status; Described synchronizing software process is used for presetting the synchronous trigger condition of virtual memory, and when described trigger condition occurs, the virtual memory realized between host apparatus with described stand-by equipment is synchronous; The described synchronizing software process of host apparatus is in running status, and the described synchronizing software process of described stand-by equipment is in synchronous operation but can not accessed supervisor status; Described fault management process is for realizing management to described host apparatus hardware, described guest virtual machine and described synchronizing software process and fault recovery; The described fault management process of host apparatus is in running status, and the described fault management process of described stand-by equipment is in synchronous operation but can not accessed supervisor status.
The synchronous implementation method of described virtual memory comprises: the described guest virtual machine suspending described host apparatus runs, determine to occur to from last time described trigger condition the virtual memory page that when current described trigger condition occurs, guest virtual machine, content of pages changes, and carry out synchronously to the virtual memory page that described content of pages changes, make the content of the corresponding virtual memory page of guest virtual machine described in described stand-by equipment consistent with the content of the virtual memory page that content of pages described in described host apparatus changes; Meanwhile, page content is written to during respective logic magnetic disc rolls up by host apparatus and stand-by equipment, then discharges I/O buffer zone; Again the described guest virtual machine recovering described host apparatus runs, the described guest virtual machine of described host apparatus complete this virtual memory synchronous in the I/O read-write operation comprised, and ensure that the data in magnetic disk of described host apparatus is consistent with the data in magnetic disk of described stand-by equipment.
The synchronous trigger condition of described virtual memory comprises: the change of I/O state occurs the described guest virtual machine of described host apparatus.
It is that 50MB is per second that storage I/O on described virtual client operates maximum read or write speed.
It is that 5MB is per second that network I/O on described virtual client operates maximum read or write speed.
When the described fault management process detection of described host apparatus breaks down to described host apparatus, stop the operation of the described guest virtual machine of described host apparatus, and the guest virtual machine starting described stand-by equipment runs, start the synchronizing software process of described stand-by equipment and the operation of fault management process, allow described stand-by equipment take over the work of described host apparatus.
As mentioned above, a kind of MATLAB software method and system of the present invention, there is following beneficial effect: can shut down risk by trouble saving, when host apparatus breaks down, business will move to stand-by equipment, internal storage data due to stand-by equipment keeps synchronous with the internal storage data of host apparatus in checkpoint, therefore operating system and software program can run unaffected continuously, and business is interrupted for zero second, and application is uninterrupted continuously, without any loss of data, reliability is higher; And this technical scheme is independent of operating system layer, apply more extensive.
Accompanying drawing explanation
Fig. 1 is shown as the schematic flow sheet of an embodiment of a kind of MATLAB software method of the present invention.
The virtual memory that Fig. 2 is shown as an embodiment of a kind of MATLAB software method of the present invention synchronously performs schematic diagram.
Fig. 3 is shown as the module diagram of an embodiment of a kind of MATLAB software system of the present invention.
Fig. 4 is shown as the structural representation of an embodiment of a kind of MATLAB software system of the present invention.
Element numbers explanation
1 MATLAB software system
11 host apparatus
111 guest virtual machine modules
112 synchronizing software modules
113 fault management module
12 stand-by equipments
121 guest virtual machine modules
122 synchronizing software modules
123 fault management module
S1 ~ S3 step
Embodiment
Below by way of specific instantiation, embodiments of the present invention are described, those skilled in the art the content disclosed by this instructions can understand other advantages of the present invention and effect easily.The present invention can also be implemented or be applied by embodiments different in addition, and the every details in this instructions also can based on different viewpoints and application, carries out various modification or change not deviating under spirit of the present invention.
It should be noted that, the diagram provided in the present embodiment only illustrates basic conception of the present invention in a schematic way, then only the assembly relevant with the present invention is shown in graphic but not component count, shape and size when implementing according to reality is drawn, it is actual when implementing, and the kenel of each assembly, quantity and ratio can be a kind of change arbitrarily, and its assembly layout kenel also may be more complicated.
The invention provides a kind of fault-tolerance approach synchronous based on virtual memory, be applied to and comprise in the MATLAB software system of host apparatus and stand-by equipment, described host apparatus can be master server or main frame etc., described stand-by equipment is equipment identical with the software and hardware setting of described host apparatus, comprises from server, from machine etc.Described host apparatus and stand-by equipment all comprise the guest virtual machine be based upon on virtual kernel.Usually based on the cascade of synchronous ethernet network, without transparent bridging NTB (None Transparent Bridge) cascade or infiniband cascade mode between described host apparatus (main frame) and stand-by equipment (from machine), subordinate computer node also must need according to the communication of enough bandwidth for check point (memory pages data syn-chronization).In one embodiment, as shown in Figure 1, the described fault-tolerance approach synchronous based on virtual memory comprises:
Step S1, presets the trigger condition that virtual memory is synchronous.The synchronous trigger condition of described virtual memory can comprise: the virtual client on described main process equipment occurs the change of I/O state.The change of I/O state not only comprises storage I/O state or the change of network I/O state, and this change is not limited to data variation, comprises various I/O and asks change, all kinds of state change.In one embodiment, the trigger condition that virtual memory is synchronous is that on host apparatus, the I/O state change of virtual client system, comprises disk, network data change, resource occupation state, time state and linking status etc.Particularly, in system, the central processing unit of host apparatus starts synchronizing software process, moment monitors that described virtual client I/O state changes, changing each time all can the synchronous trigger condition of generating virtual internal memory determine checkpoint (checkpoint), and it is synchronous to perform the virtual memory page.
Step S2, when described trigger condition occurs, the virtual memory completing once described host apparatus and described stand-by equipment is synchronous.When described trigger condition occurs, need once the virtual memory of described host apparatus and described stand-by equipment synchronous.Particularly, when described trigger condition occurs, run by the described guest virtual machine suspending described host apparatus, namely maintenance (Held) main frame is the state of " preparation ", and this " preparation " state is called as checkpoint (Checkpoint) or check point.A trigger condition occurs to the time interval of next trigger condition generation, and namely a checkpoint is to the time of another checkpoint, can be described as the checkpoint cycle.In one embodiment, determine to occur to from last time described trigger condition the virtual memory page that when current described trigger condition occurs, guest virtual machine, content of pages changes, and by the content delivery of all determined virtual memory pages to described stand-by equipment, make the content of the virtual memory page of guest virtual machine described in described stand-by equipment consistent with the content of determined virtual memory page in described host apparatus.Particularly, host apparatus will be recorded in the virtual memory page amendment situation occurred in the checkpoint cycle, when carrying out synchronous with the virtual memory of stand-by equipment, only carry out synchronously to the described virtual memory page sending amendment, completely the same with the content of pages of the guest virtual machine of the content of pages and described host apparatus that realize the guest virtual machine of described stand-by equipment.The method is the method usually adopted in virtual memory is synchronous.In another embodiment, when carrying out synchronous with the virtual memory of stand-by equipment, by the content delivery of all virtual memory pages of guest virtual machine described in host apparatus in described stand-by equipment, make virtual memory pages all in guest virtual machine described in described stand-by equipment consistent to the corresponding virtual memory page of guest virtual machine described in described host apparatus.Namely carry out synchronously to all virtual memory pages in described host apparatus, completely the same with the content of pages of the guest virtual machine of the content of pages and described host apparatus that realize the guest virtual machine of described stand-by equipment.Simultaneously, page content is written to during respective logic magnetic disc rolls up with Block (data block) form by host apparatus and stand-by equipment, increment synchronization is realized based on disk sector bitmap index, now, the central processing unit of described stand-by equipment feeds back an ack signal to the central processing unit (CPU) of described host apparatus, and disk synchronously completes and confirms data consistent.Then central processing unit release I/O buffer zone.Again to recover described in described host apparatus guest virtual machine to run, described guest virtual machine complete this virtual memory synchronous in the I/O read-write operation comprised, and ensure that the data in magnetic disk of described host apparatus is consistent with the data in magnetic disk of described stand-by equipment.And ensure that the data in magnetic disk of described host apparatus is consistent with the data in magnetic disk of described stand-by equipment.
Around an I/O state change generation checkpoint, and it is synchronous to carry out virtual memory page, the management of virtual client system on host apparatus to disk and network relies on " I/O buffer zone ", after each checkpoint cycle inter-sync terminates, I/O newly asks just to be released, and prepares to perform the next checkpoint cycle.The number of times that in unit interval, checkpoint occurs is called as checkpoint rate, and unit can comprise " cycle per second ".In computer system, the characteristic of application load determines the period frequency of checkpoint.Collecting in the cycle of checkpoint has how many memory pages to be modified, and depends primarily on the frequency that host apparatus virtual client system I/O state changes.In the cycle of minimizing checkpoint per second, client operating system has larger potentiality to do computing in large quantities.Higher checkpoint rate, represents measurable resource occupation.Usually being less than for 200 cycles per secondly means that system is not busy.The cycle of checkpoint can take physical machine memory source and synchronizing network bandwidth.Higher checkpoint rate, by cause all can be less in the deenergized period of I/O buffer zone and I/O network delay.Along with I/O state changes the increase of (as network activity), the checkpoint cycle frequently with the delay that produces reduce.Different types of service has different internal memory synchronizing frequencies usually, as: calculating comprehensive sounding is 1-10 time per second; Probe the memory business is 10-50 time per second; File copy business is 5-20 time per second; SQL query business is 10-30 time per second; Web document transfer business is 50-200 time per second; 50-500 time per second of SQL transaction moderate business; SQL frequent business of concluding the business is 500-1500 time per second.For ensureing the actual effect of Client application, virtual i/o request response comprises disk and network all adopts flow control measure.In one embodiment, be maximum 5MB file transfer bandwidth per second for network (each Microsoft Loopback Adapter), for disk, maximum 50MB file read-write per second.
In one embodiment, as shown in Figure 2, the I/O state of host apparatus comprises storage and network I/O state is all associated in internal memory synchronized process, network state is kept (Hold) in active host node I/O buffer zone, store read-write state to perform on the primary node, but store and write state and be kept (Hold) from node in I/O buffer zone.Each I/O state change all can be monitored, and determines its type and judges whether to need to trigger checkpoint, synchronous the need of memory pages.If needed, just in host node, operating guest virtual machine system is stopped, and from carrying out collecting to the memory pages revised context in this checkpoint cycle after last checkpoint and being sent in the internal memory synchronized process secondary node.Once capture memory pages revised context from host node, guest virtual machine system VM will continue to run again ... from the internal memory synchronized process that host node runs, can the content of pages revised be mapped in local internal memory, and trigger one group and perform request, discharge network transmission requests in host node I/O buffer zone respectively and from the disk write request host node I/O buffer zone.It is particularly to be noted that disk write request is only kept (Hold) from host node, therefore from the mirror image data of host disk content representative " before performing checkpoint ", if host node is delayed machine before checkpoint completes, what subordinate computer node was preserved is the data that a upper checkpoint completes, and regenerates I/O transmission request.In rejuvenation, the consistance of two side datas can be ensured by disk mirroring mode.This way, during can avoiding Failure Transfer, data in magnetic disk repeats write.
In one embodiment, the described fault-tolerance approach synchronous based on virtual memory also comprises:
Step S3, when described host apparatus breaks down, stops the operation of the described guest virtual machine of described host apparatus, and starts the operation of the guest virtual machine of described stand-by equipment, allow described stand-by equipment take over the work of described host apparatus.Particularly, when described fault management module detects that described host apparatus breaks down, stop the operation of the described guest virtual machine of described host apparatus, and start operation and the I/O correspondence with foreign country of the guest virtual machine of described stand-by equipment, accept client-access management operating; Complete once to the transfer of described virtual client access control.In one embodiment, I/O buffer zone can not be used in virtual machine clients internal memory migration process, after once successfully internal memory migration terminates, have an of short duration network request to pause, guest virtual machine runs and confirms the checkpoint cycle on former secondary node, but this network request stalled cycles is less than 1 millisecond, for business network transmission negligible, Ethernet linking status and tcp data transmission be not affected.When therefore having cashed any hostdown generation, business has switched interruption in zero second.Now, due to host apparatus (main frame) nodes break down, virtual memory synchronized process is out of service, and guest virtual machine is fault-tolerant operation no longer, state that it is called as " degradation ", and duty is single work pattern.Magnetic disc i/o (write) copies stopping.
In the virtual memory synchronizing process of guest virtual machine, the active state of guest virtual machine comprises following several: 1, start/starting state: a period of time after startup guest virtual machine system, once network service is clear and coherent, state can be transformed into " RUN ".Process before this state, is called " startup ".2, stop/halted state: turn-off request has been sent to guest virtual machine operating system.Until virtual machine disconnects internal bus, it represents " a stopping " state.Afterwards, halted state is defined as.3, just in running status: when the network service and disk read-write state that can confirm guest virtual machine are set up, this state is represented as " running ".4, transition state: when guest virtual machine access rights move the state of (running from node until it is successfully transferred to) between host apparatus node to stand-by equipment node, be defined as " migration ".
The present invention also provides a kind of MATLAB software system.In one embodiment, as shown in Figure 3, MATLAB software system 1 comprises host apparatus 11 and stand-by equipment 12.Described host apparatus 11 can be master server or main frame etc., and described stand-by equipment 12 is the identical equipment of software and hardware setting with described host apparatus 11, comprises from server, from machine etc.Described host apparatus 11 and stand-by equipment 12 all comprise the guest virtual machine be based upon on virtual kernel.Described host apparatus (main frame, master server, host node etc.) is usually based on the cascade of synchronous ethernet network, without transparent bridging NTB (None Transparent Bridge) cascade or infiniband cascade mode between 11 and stand-by equipment (from machine, from server, subordinate computer node etc.) 12, and subordinate computer node also must need according to the communication of enough bandwidth for check point (memory pages data syn-chronization).
Described host apparatus 11 comprises guest virtual machine module 111, synchronizing software module 112 and fault management module 113; Described stand-by equipment 12 comprises guest virtual machine module 121, synchronizing software module 122 and fault management module 123.When described MATLAB software system 1 is run, the synchronizing software module (112 and 122) of described host apparatus 11 and described stand-by equipment 12, fault management module (113 and 123), guest virtual machine module (111 and 112) all operate on virtual kernel, synchronizing software process, fault management process, guest virtual machine respectively on corresponding described virtual kernel.Namely the synchronizing software module (112) of described host apparatus 11, fault management module (113), guest virtual machine module (111) all operate on the virtual kernel of described host apparatus 11, synchronizing software process (112), fault management process (113), guest virtual machine (111) on the described virtual kernel of corresponding host apparatus 11 respectively.The synchronizing software module (122) of described stand-by equipment 12, fault management module (123), guest virtual machine module (112) all operate on the virtual kernel of described stand-by equipment 12, synchronizing software process (122), fault management process (123), guest virtual machine (121) on the described virtual kernel of the corresponding stand-by equipment 12 of difference.
Described guest virtual machine (111 and 121) is for realizing the operation of application program.Wherein, the described guest virtual machine 111 of host apparatus 11 is in running status, and the described guest virtual machine 121 of described stand-by equipment 12 is in halted state.
Described synchronizing software module or synchronizing software process (112 and 122) are for the synchronous trigger condition of default virtual memory, and when described trigger condition occurs, the virtual memory realized between host apparatus 11 with described stand-by equipment 12 is synchronous.The synchronous trigger condition of described virtual memory can comprise: the I/O state change of virtual client system on host apparatus.Particularly, the change of I/O state not only comprises storage I/O state or the change of network I/O state, and this change is not limited to data variation, comprises various I/O and asks change, all kinds of state change.In system, each I/O state change of host apparatus 11 all can be monitored, and determines its type and judges whether to need to trigger checkpoint, synchronous the need of memory pages.In one embodiment, the trigger condition that between master-slave equipment, virtual memory is synchronous is, on host apparatus, the I/O state change of virtual client system, comprises disk, network data change, resource occupation state, time state and linking status etc.
The implementation method that between described host apparatus 11 with described stand-by equipment 12, virtual memory is synchronous comprises: the described guest virtual machine 111 stopping described host apparatus 11, determine to occur to from last time described trigger condition the virtual memory page that when current described trigger condition occurs, guest virtual machine 111, content of pages changes, and carry out synchronously to the virtual memory page that described content of pages changes, make the content of the corresponding virtual memory page of guest virtual machine 121 described in described stand-by equipment 12 consistent with the content of the virtual memory page that content of pages described in described host apparatus 11 changes, restart described guest virtual machine 111 to run, described guest virtual machine 111 complete this virtual memory synchronous in the I/O comprised operate, and ensure that the data in magnetic disk of described host apparatus 11 is consistent with the data in magnetic disk of described stand-by equipment 12 in disk mirroring mode.
When described trigger condition occurs, the described guest virtual machine 111 of described host apparatus 11 will be stopped, namely maintenance (Held) main frame is the state of " preparation ", and this " preparation " state is called as checkpoint (Checkpoint) or check point.A trigger condition occurs to the time interval of next trigger condition generation, and namely a checkpoint is to the time of another checkpoint, can be described as the checkpoint cycle.In one embodiment, determine to occur to from last time described trigger condition the virtual memory page that when current described trigger condition occurs, guest virtual machine 111, content of pages changes, and by the content delivery of all determined virtual memory pages to described stand-by equipment 12, make the content of determined virtual memory page in the content of the virtual memory page corresponding to described determined virtual memory page of guest virtual machine 121 described in described stand-by equipment 12 and described host apparatus 11 consistent.Particularly, host apparatus 11 will be recorded in the virtual memory page amendment situation occurred in the checkpoint cycle, when carrying out synchronous with the virtual memory of stand-by equipment 12, only carry out synchronously to the described virtual memory page sending amendment, completely the same with the content of pages of the guest virtual machine 111 of the content of pages and described host apparatus 11 that realize the guest virtual machine 121 of described stand-by equipment 12.The method is the method usually adopted in virtual memory is synchronous.In another embodiment, when carrying out synchronous with the virtual memory of stand-by equipment 12, by the content delivery of all virtual memory pages of guest virtual machine 111 described in host apparatus 11 in described stand-by equipment 12, make virtual memory pages all in guest virtual machine 121 described in described stand-by equipment 12 consistent to the corresponding virtual memory page of guest virtual machine 111 described in described host apparatus 11.Namely carry out synchronously to all virtual memory pages in described host apparatus 11, completely the same with the content of pages of the guest virtual machine 111 of the content of pages and described host apparatus 11 that realize the guest virtual machine 121 of described stand-by equipment 12.Then, the described guest virtual machine 111 restarting described host apparatus 11 is run, the described guest virtual machine 111 of described host apparatus 11 complete this virtual memory synchronous in the I/O comprised operate, and ensure that the data in magnetic disk of described host apparatus 11 is consistent with the data in magnetic disk of described stand-by equipment 12.
Business network transmission must be carried out synchronously around a checkpoint, and Internet Transmission relies on " I/O buffer zone ", and after each checkpoint cycle inter-sync terminates, I/O newly asks just to be released, and prepares to perform the next checkpoint cycle.The number of times that in unit interval, checkpoint occurs is called as checkpoint rate, and unit can comprise " cycle per second ".In computer system, the characteristic of application load determines the period frequency of checkpoint.Collecting in the cycle of checkpoint has how many memory pages to be modified, and depends primarily on network I/O (transmission) speed.In the cycle of minimizing checkpoint per second, client operating system has larger potentiality to do computing in large quantities.Higher checkpoint rate, represents measurable resource occupation.Usually being less than for 200 cycles per secondly means that system is not busy.The cycle of checkpoint can take physical machine memory source and synchronizing network bandwidth.Higher checkpoint rate, by cause all can be less in the deenergized period of I/O buffer zone and I/O network delay.Along with the increase of network activity, the checkpoint cycle frequently with the delay that produces reduce.For ensureing the actual effect of Client application, virtual i/o request response comprises disk and network all adopts flow control measure.In one embodiment, be maximum 5MB file transfer bandwidth per second for network (each Microsoft Loopback Adapter), for disk, maximum 50MB file read-write per second.
In one embodiment, host apparatus 11 (comprising main frame or master server etc.) is host node.The storage of host node and network I/O operation are all associated with in internal memory synchronized process, network state is kept (Hold) in active host node I/O buffer zone, store read-write state to perform on the primary node, but store and write state and be kept (Hold) from node in I/O buffer zone.Each network transmission requests all can be monitored, and determines its type and judges whether to need to trigger checkpoint, synchronous the need of memory pages.If needed, just in host node, operating guest virtual machine system 111 is stopped, and from carrying out collecting to the memory pages revised context in this checkpoint cycle after last checkpoint and being sent in the internal memory synchronized process secondary node.Once capture memory pages revised context from host node, guest virtual machine system VM will continue to run again ... from the internal memory synchronized process that host node runs, can the content of pages revised be mapped in local internal memory, and trigger one group and perform request, discharge network transmission requests in host node I/O buffer zone respectively and from the disk write request host node I/O buffer zone.It is particularly to be noted that disk write request is only kept (Hold) from host node, therefore from the mirror image data of host disk content representative " before performing checkpoint ", if host node is delayed machine before checkpoint completes, what subordinate computer node was preserved is the data that a upper checkpoint completes, and regenerates I/O transmission request.In rejuvenation, the consistance of two side datas can be ensured by disk mirroring mode.
Described fault management module or fault management process (113 and 123) are for realizing management to described host apparatus hardware, described guest virtual machine (111 and 121) and described synchronizing software module (112 and 122) and fault recovery.Particularly, when described fault management module detects that described host apparatus breaks down, stop the operation of the described guest virtual machine of described host apparatus, and start the operation of the guest virtual machine of described stand-by equipment.Further, when described fault management module detects that described host apparatus breaks down, stop the operation of the described guest virtual machine of described host apparatus, and start operation and the I/O correspondence with foreign country of the guest virtual machine of described stand-by equipment, accept client-access management operating; Complete once to the transfer of described virtual client access control.
In one embodiment, as shown in Figure 4, fault detect (i.e. fault management module 113 and 123) in figure is connected by private network, and network control module is responsible for internal memory synchronized process (internal memory synchronization module 112 and 122 realization) and is performed the synchronous data transmission of virtual memory between host apparatus 11 and stand-by equipment 12.When host apparatus 11 breaks down, will the migration of virtual client be realized, and namely run from the virtual client 111 of host apparatus 11 virtual client 121 moving to stand-by equipment 12 and run.(move to virtual client 121 from virtual client 111 to run) in virtual client internal memory migration process and can not use I/O buffer zone.Now, stand-by equipment 12 will become host apparatus 11 and run virtual client (121 become 111), and host apparatus originally will be stopped using.After once successfully internal memory migration terminates, have an of short duration network request to pause, guest virtual machine 111 runs and confirms the checkpoint cycle on former secondary node, but this network request stalled cycles is less than 1 millisecond, for business network transmission negligible, Ethernet linking status and tcp data transmission be not affected.When therefore having cashed any hostdown generation, business has switched interruption in zero second.Now, the machine fault if host apparatus (main frame) node is delayed, virtual memory synchronized process is out of service, guest virtual machine 111 no longer fault-tolerant operation, state that it is called as " degradation ", and duty is single work pattern.Only can operate on single host apparatus, magnetic disc i/o duplicating process is out of service.The more important thing is; if originally host apparatus (main frame) 11 node failure; but non-stopped status; as network interruption; fan failure etc.; although virtual client moves, namely run from the virtual client 111 of host apparatus 11 virtual client 121 moving to stand-by equipment 12 and run.But virtual memory synchronized process is still normal to be run, disk synchronously also normally carries out.Just former stand-by equipment 12 will become host apparatus 11, and virtual client performs migration (121 become 111).Now virtual client 111 is in the fault-tolerant state of unit, and state that it is also referred to as " degradation ", duty is non-fully fault-tolerant mode.Exactly because this characteristic superiority, the fault cross occurrence of described host apparatus and described stand-by equipment can be realized, or be called intersection cooperating, such as, described host apparatus network node sends fault, described stand-by equipment disk nodes break down, described virtual client is still normal to be run.
In one embodiment, MATLAB software system configures identical server No. 1 network interface (NIC 1) to connecting (one is main frame, and one is from machine) by two, adopts super Category-5 twisted pair.By integrated two 10,000,000,000 network interface distiches, adopt multimode optical fiber wire jumper LC joint.What adopt is KVM virtualization kernel, installs host system Cent OS Linux6.5 or more version of increasing income, and installs KVM virtual machine assembly and enables.Application A PP or database are in the upper operation of virtual client (121 become 111), synchronizing software module (112 and 122) comprises provides internal memory synchronous and select mediation service, system generation " fissure " phenomenon when avoiding synchronization links to interrupt.Fault management module (113 and 123) creates a Domain0 fictitious host computer, can select the linux system of Cent OS or other main brands; And have employed the next self-defined developing user interface UI (User interface) of Apache tomcat Server.Fault management module (113 and 123) imports server master board IPMI packet, can realize safeguarding the browser mode of whole tolerant system, service object comprises host hardware, guest virtual machine and synchronous operation state, possesses resource distribution and fault handling function.
In sum, a kind of MATLAB software method and system of the present invention, there is following beneficial effect: can shut down risk by trouble saving, when host apparatus breaks down, business will move to stand-by equipment, internal storage data due to stand-by equipment keeps synchronous with the internal storage data of host apparatus in checkpoint, therefore operating system and software program can run unaffected continuously, business migration in zero second, and application is uninterrupted continuously, without any loss of data, reliability is higher; And this technical scheme is independent of operating system layer, apply more extensive.So the present invention effectively overcomes various shortcoming of the prior art and tool high industrial utilization.
Above-described embodiment is illustrative principle of the present invention and effect thereof only, but not for limiting the present invention.Any person skilled in the art scholar all without prejudice under spirit of the present invention and category, can modify above-described embodiment or changes.Therefore, such as have in art usually know the knowledgeable do not depart from complete under disclosed spirit and technological thought all equivalence modify or change, must be contained by claim of the present invention.
Claims (11)
1. a MATLAB software method, be applied to and comprise in the MATLAB software system of host apparatus and stand-by equipment, it is characterized in that, described host apparatus and stand-by equipment all comprise the guest virtual machine be based upon on virtual kernel, and described MATLAB software method comprises:
Preset the trigger condition that virtual memory is synchronous;
When described trigger condition occurs, the virtual memory completing once described host apparatus and described stand-by equipment is synchronous.
2. MATLAB software method according to claim 1, it is characterized in that: described MATLAB software method also comprises: when described host apparatus breaks down, stop the operation of the described guest virtual machine of described host apparatus, and start the operation of the guest virtual machine of described stand-by equipment, allow described stand-by equipment take over the work of described host apparatus.
3. MATLAB software method according to claim 1, it is characterized in that: described in complete once described host apparatus and described stand-by equipment virtual memory synchronously comprise: the operation suspending the described guest virtual machine of described host apparatus, the memory pages content realizing the memory pages content of the guest virtual machine of described stand-by equipment and the guest virtual machine of described host apparatus is completely the same; The described guest virtual machine restarting described host apparatus runs, the described guest virtual machine of described host apparatus complete this virtual memory synchronous in the I/O comprised operate, and ensure that the data in magnetic disk of the data in magnetic disk of described host apparatus and described stand-by equipment is completely the same.
4. MATLAB software method according to claim 3, it is characterized in that: the described content of pages realizing the content of pages of the guest virtual machine of described stand-by equipment and the guest virtual machine of described host apparatus is completely the same to be comprised: determine to occur to from last time described trigger condition the virtual memory page that when current described trigger condition occurs, guest virtual machine, content of pages changes, and by the content delivery of all determined virtual memory pages to described stand-by equipment, make the content of determined virtual memory page in the content of the virtual memory page corresponding to described determined virtual memory page of guest virtual machine described in described stand-by equipment and described host apparatus consistent.
5. MATLAB software method according to claim 1, is characterized in that: the synchronous trigger condition of described virtual memory comprises: the described virtual client of described host apparatus occurs the change of I/O state.
6. a MATLAB software system, comprises host apparatus and stand-by equipment, it is characterized in that: described host apparatus and described stand-by equipment all comprise synchronizing software module, fault management module, guest virtual machine module; During described MATLAB software system cloud gray model, the synchronizing software module of described host apparatus and described stand-by equipment, fault management module, guest virtual machine module all operate on virtual kernel, synchronizing software process, fault management process, guest virtual machine respectively on corresponding described virtual kernel;
Wherein:
Described guest virtual machine is for realizing the operation of application program; The described guest virtual machine of host apparatus is in running status, and the described guest virtual machine of described stand-by equipment is in synchronous operation but can not accessed supervisor status;
Described synchronizing software process is used for presetting the synchronous trigger condition of virtual memory, and when described trigger condition occurs, the virtual memory realized between host apparatus with described stand-by equipment is synchronous; The described synchronizing software process of host apparatus is in running status, and the described synchronizing software process of described stand-by equipment is in synchronous operation but can not accessed supervisor status;
Described fault management process is for realizing management and the fault recovery of the hardware to described primary and backup equipment, described guest virtual machine and described synchronizing software process; The described fault management process of host apparatus is in running status, and the described fault management process of described stand-by equipment is in synchronous operation but can not accessed supervisor status.
7. MATLAB software system according to claim 6, it is characterized in that: the synchronous implementation method of described virtual memory comprises: the described guest virtual machine suspending described host apparatus runs, determine to occur to from last time described trigger condition the virtual memory page that when current described trigger condition occurs, guest virtual machine, content of pages changes, and carry out synchronously to the virtual memory page that described content of pages changes, make the content of the corresponding virtual memory page of guest virtual machine described in described stand-by equipment consistent with the content of the virtual memory page that content of pages described in described host apparatus changes, meanwhile, page content is written to during respective logic magnetic disc rolls up by host apparatus and stand-by equipment, then discharges I/O buffer zone, again the described guest virtual machine recovering described host apparatus runs, the described guest virtual machine of described host apparatus complete this virtual memory synchronous in the I/O read-write operation comprised, and ensure that the data in magnetic disk of described host apparatus is consistent with the data in magnetic disk of described stand-by equipment.
8. MATLAB software system according to claim 6, is characterized in that: the synchronous trigger condition of described virtual memory comprises: the change of I/O state occurs the described guest virtual machine of described host apparatus.
9. MATLAB software system according to claim 6, is characterized in that: it is that 50MB is per second that the storage I/O on described virtual client operates maximum read or write speed.
10. MATLAB software system according to claim 6, is characterized in that: it is that 5MB is per second that the network I/O on described virtual client operates maximum read or write speed.
11. MATLAB software systems according to claim 6, it is characterized in that: when the described fault management process detection of described host apparatus breaks down to described host apparatus, stop the operation of the described guest virtual machine of described host apparatus, and start the operation of the guest virtual machine of described stand-by equipment, start the synchronizing software process of described stand-by equipment and the operation of fault management process, allow described stand-by equipment take over the work of described host apparatus.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410632804.3A CN104391764B (en) | 2014-10-22 | 2014-11-11 | A kind of MATLAB software method and system |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410567923 | 2014-10-22 | ||
CN2014105679235 | 2014-10-22 | ||
CN201410632804.3A CN104391764B (en) | 2014-10-22 | 2014-11-11 | A kind of MATLAB software method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104391764A true CN104391764A (en) | 2015-03-04 |
CN104391764B CN104391764B (en) | 2018-02-16 |
Family
ID=52609672
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410632804.3A Active CN104391764B (en) | 2014-10-22 | 2014-11-11 | A kind of MATLAB software method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104391764B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105373418A (en) * | 2015-11-25 | 2016-03-02 | 北京汉柏科技有限公司 | Implementation method and device of virtual machine FT (Fault Tolerance) |
CN105471994A (en) * | 2015-12-01 | 2016-04-06 | 华为技术有限公司 | Control method and device |
CN106254236A (en) * | 2016-08-05 | 2016-12-21 | 成都广达新网科技股份有限公司 | A kind of multiserver slave method of work based on TCP event |
CN106970861A (en) * | 2017-03-30 | 2017-07-21 | 山东超越数控电子有限公司 | A kind of virtual machine fault-tolerance approach and system |
CN107315624A (en) * | 2017-06-30 | 2017-11-03 | 联想(北京)有限公司 | Information processing method and virtualization manager |
CN108885575A (en) * | 2016-04-01 | 2018-11-23 | 三菱电机株式会社 | The restoration processing method of control device and control device |
CN109150596A (en) * | 2018-08-08 | 2019-01-04 | 新智能源系统控制有限责任公司 | A kind of SCADA system real time data dump method and device |
CN112131088A (en) * | 2020-09-29 | 2020-12-25 | 北京计算机技术及应用研究所 | High availability method based on health examination and container |
CN112256477A (en) * | 2020-10-09 | 2021-01-22 | 上海云轴信息科技有限公司 | Virtualization fault-tolerant method and device |
CN113741248A (en) * | 2021-08-13 | 2021-12-03 | 北京和利时系统工程有限公司 | Edge calculation controller and control system |
CN114217905A (en) * | 2021-12-17 | 2022-03-22 | 北京志凌海纳科技有限公司 | High-availability recovery processing method and system for virtual machine |
CN114501057A (en) * | 2021-12-17 | 2022-05-13 | 阿里巴巴(中国)有限公司 | Data processing method, storage medium, processor and system |
CN115858222A (en) * | 2022-12-19 | 2023-03-28 | 安超云软件有限公司 | Virtual machine fault processing method and system and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101034364A (en) * | 2007-04-02 | 2007-09-12 | 华为技术有限公司 | Method, device and system for implementing RAM date backup |
JP2011180871A (en) * | 2010-03-02 | 2011-09-15 | Nec Corp | Fault tolerant system and virtual machine construction method |
CN102262558A (en) * | 2011-08-04 | 2011-11-30 | 中兴通讯股份有限公司 | Synchronizing method and system of virtual machine |
CN103412800A (en) * | 2013-08-05 | 2013-11-27 | 华为技术有限公司 | Virtual machine warm backup method and equipment |
US8826283B2 (en) * | 2008-10-28 | 2014-09-02 | Vmware, Inc. | Low overhead fault tolerance through hybrid checkpointing and replay |
-
2014
- 2014-11-11 CN CN201410632804.3A patent/CN104391764B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101034364A (en) * | 2007-04-02 | 2007-09-12 | 华为技术有限公司 | Method, device and system for implementing RAM date backup |
US8826283B2 (en) * | 2008-10-28 | 2014-09-02 | Vmware, Inc. | Low overhead fault tolerance through hybrid checkpointing and replay |
JP2011180871A (en) * | 2010-03-02 | 2011-09-15 | Nec Corp | Fault tolerant system and virtual machine construction method |
CN102262558A (en) * | 2011-08-04 | 2011-11-30 | 中兴通讯股份有限公司 | Synchronizing method and system of virtual machine |
CN103412800A (en) * | 2013-08-05 | 2013-11-27 | 华为技术有限公司 | Virtual machine warm backup method and equipment |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105373418A (en) * | 2015-11-25 | 2016-03-02 | 北京汉柏科技有限公司 | Implementation method and device of virtual machine FT (Fault Tolerance) |
CN105471994B (en) * | 2015-12-01 | 2019-01-15 | 华为技术有限公司 | A kind of control method and device |
CN105471994A (en) * | 2015-12-01 | 2016-04-06 | 华为技术有限公司 | Control method and device |
CN108885575A (en) * | 2016-04-01 | 2018-11-23 | 三菱电机株式会社 | The restoration processing method of control device and control device |
CN108885575B (en) * | 2016-04-01 | 2022-03-11 | 三菱电机株式会社 | Control device and restoration processing method for control device |
CN106254236A (en) * | 2016-08-05 | 2016-12-21 | 成都广达新网科技股份有限公司 | A kind of multiserver slave method of work based on TCP event |
CN106970861A (en) * | 2017-03-30 | 2017-07-21 | 山东超越数控电子有限公司 | A kind of virtual machine fault-tolerance approach and system |
CN107315624A (en) * | 2017-06-30 | 2017-11-03 | 联想(北京)有限公司 | Information processing method and virtualization manager |
CN107315624B (en) * | 2017-06-30 | 2020-11-20 | 联想(北京)有限公司 | Information processing method and virtualization manager |
CN109150596A (en) * | 2018-08-08 | 2019-01-04 | 新智能源系统控制有限责任公司 | A kind of SCADA system real time data dump method and device |
CN112131088A (en) * | 2020-09-29 | 2020-12-25 | 北京计算机技术及应用研究所 | High availability method based on health examination and container |
CN112131088B (en) * | 2020-09-29 | 2024-04-09 | 北京计算机技术及应用研究所 | High availability method based on health examination and container |
CN112256477A (en) * | 2020-10-09 | 2021-01-22 | 上海云轴信息科技有限公司 | Virtualization fault-tolerant method and device |
CN113741248A (en) * | 2021-08-13 | 2021-12-03 | 北京和利时系统工程有限公司 | Edge calculation controller and control system |
CN114217905A (en) * | 2021-12-17 | 2022-03-22 | 北京志凌海纳科技有限公司 | High-availability recovery processing method and system for virtual machine |
CN114501057A (en) * | 2021-12-17 | 2022-05-13 | 阿里巴巴(中国)有限公司 | Data processing method, storage medium, processor and system |
CN114501057B (en) * | 2021-12-17 | 2024-06-14 | 阿里巴巴(中国)有限公司 | Data processing method, storage medium, processor and system |
CN115858222A (en) * | 2022-12-19 | 2023-03-28 | 安超云软件有限公司 | Virtual machine fault processing method and system and electronic equipment |
CN115858222B (en) * | 2022-12-19 | 2024-01-02 | 安超云软件有限公司 | Virtual machine fault processing method, system and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN104391764B (en) | 2018-02-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104391764A (en) | Computer fault-tolerant method and computer fault-tolerant system | |
US9389976B2 (en) | Distributed persistent memory using asynchronous streaming of log records | |
US10922135B2 (en) | Dynamic multitasking for distributed storage systems by detecting events for triggering a context switch | |
US8713362B2 (en) | Obviation of recovery of data store consistency for application I/O errors | |
US9423956B2 (en) | Emulating a stretched storage device using a shared storage device | |
US9483352B2 (en) | Process control systems and methods | |
JP2011060055A (en) | Virtual computer system, recovery processing method and of virtual machine, and program therefor | |
US10185636B2 (en) | Method and apparatus to virtualize remote copy pair in three data center configuration | |
US9442811B2 (en) | Emulating a stretched storage device using a shared replicated storage device | |
US10445295B1 (en) | Task-based framework for synchronization of event handling between nodes in an active/active data storage system | |
US8682852B1 (en) | Asymmetric asynchronous mirroring for high availability | |
CN108469996A (en) | A kind of system high availability method based on auto snapshot | |
US20210294497A1 (en) | Storage system and method for analyzing storage system | |
CN107111530A (en) | A kind of disaster recovery method, system and device | |
CN111400086B (en) | Method and system for realizing fault tolerance of virtual machine | |
US9195528B1 (en) | Systems and methods for managing failover clusters | |
CN103885857A (en) | Virtual machine fault-tolerant method and device | |
JP6291711B2 (en) | Fault tolerant system | |
CN105808391A (en) | Method and device for hot replacing CPU nodes | |
JP6828558B2 (en) | Management device, management method and management program | |
US20210294701A1 (en) | Method of protecting data in hybrid cloud | |
US11238010B2 (en) | Sand timer algorithm for tracking in-flight data storage requests for data replication | |
Tsai et al. | FVMM: Fast VM Migration for Virtualization-based Fault Tolerance Using Templates | |
CN117827544B (en) | Hot backup system, method, electronic device and storage medium | |
Wang et al. | A remote backup approach for virtual machine images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |