CN102708028B

CN102708028B - Trusted redundant fault-tolerant computer system

Info

Publication number: CN102708028B
Application number: CN201210154659.3A
Authority: CN
Inventors: 杨明华; 慈林林; 陈晓峰; 葛根焰; 郑建群; 杨银刚; 杨斌; 黄亮; 何水发; 施鸿程; 陈强; 李轩涯; 程宾
Original assignee: FOURTH RESEARCH INSTITUTE OF SECOND ARTILLERY EQUIPMENT ACADEMY OF PLA
Current assignee: FOURTH RESEARCH INSTITUTE OF SECOND ARTILLERY EQUIPMENT ACADEMY OF PLA
Priority date: 2012-05-18
Filing date: 2012-05-18
Publication date: 2015-01-07
Anticipated expiration: 2032-05-18
Also published as: CN102708028A

Abstract

The invention provides a trusted redundant fault-tolerant computer system which aims at satisfying the requirement for high safety and high reliability of systems in safety control fields. The trusted redundant fault-tolerant computer system is capable of blocking the operation of illegal programs of malicious codes, viruses and the like, protecting the system and core applications from being destroyed, protecting important information from being revealed, stolen, tampered and ruined, and shielding faults by means of a failure switching function to enable the system to work normally when faults of the system occur. The trusted redundant fault-tolerant computer system is based on a trusted cryptography module (TCM) safety chip, and a dual-computer redundant hot standby method and a compact peripheral component interconnect (CPCI) bus framework are used. Two trusted computer subsystems are configured in a computer case, each of the trusted computer subsystems is composed of a trusted computer main module (including a TCM and a flash disk), a power source module, a flash disk expansion module and an interface expansion module, and the failure switching between the two subsystems is achieved through a heartbeat server and a failure switching module.

Description

A kind of credible redundant fault-tolerant computer system

Technical field

The present invention relates to a kind of high credible, highly reliable computer system, belong to safety control technology field.

Background technology

Along with the fast development of computer technology, in fields such as Aeronautics and Astronautics, military affairs and Industry Control, and some have the key area of " 7 × 24 " uninterrupted operation demand, and a large amount of computing machine that uses is as system control equipment.Along with the scale of engineering system and complicacy increase day by day, this type systematic just likely causes the massive losses of personnel and property once there is insincere, unreliable problem.Such as, the continuous nuclear safety accident occurred of Iran in 2011, and American UAV ground control system is by malicious code intrusion event.

Secure and trusted, be lastingly reliably the requirement of user to computer control system.Due to the introducing of reliable computing technology, the requirement of system in reliability is made also to become more urgent, especially important to some memory device.For this reason, reliable computing technology combines with fault-tolerant technique by the present invention, can improve reliability and the security of sophisticated computers control system on certain procedures.

Summary of the invention

The object of the present invention is to provide a kind of credible redundant fault-tolerant computer system, by adding TCM root of trust module often overlapping in subsystem, the software and hardware defencive function of computing machine being provided, guaranteeing the secure and trusted of computing machine; Carry out System Fault Tolerance by Redundant backup technology, improve the mission reliability of machine system.

For reaching above object, the present invention implants TCM module construction root of trust in the hardware platform of every suit subsystem, and from physical hardware bottom, trust is extended to user application layer face, for user provides believable execution environment guarantee by trust chain mechanism; , adopt plug-in card start-up mode to set up system boot when computer booting starts before, force ID authentication mechanism, in case locking system is falsely used by stranger; The cryptographic service function utilizing TCM to provide, to the protection of process and the sensitive data that stores in addition hardware level, prevents malicious user to the destruction of confidential data and steals.

In reliability, by configuring two cover trusted computer subsystem composition hot backup systems in a cabinet, realize redundancy fault-tolerant.Two cover trusted computer subsystems in cabinet, composition A, B machine system, has heartbeat detection and data synchronization mechanism between two cover systems.Only have a machine to participate in business online at ordinary times, taken over job by another machine when a machine fault.Be responsible for finding fault and performing failover by heartbeat server.After failover, service and application will continue to run at another machine, and application program can according to the checkpoint Information recovering preserved in database to nearest running status.Complete interface circuit by failover module to switch.

Accompanying drawing explanation

Fig. 1 is the connection diagram of credible redundant fault-tolerant computer system of the present invention;

Fig. 2 is the adaptation figure between trusted computer primary module of the present invention and TCM module;

Fig. 3 is the fundamental diagram of credible redundant fault-tolerant computer system of the present invention;

Fig. 4 is control switching circuit schematic diagram of the present invention;

Fig. 5 is the failover schematic flow sheet of credible redundant fault-tolerant computer system of the present invention;

Fig. 6 is failover module work schematic diagram of the present invention;

Embodiment

Referring to Fig. 1, native system have employed two redundancy/bis-Active/ high-availability cluster mode, adopts CPCI framework, power acquisition 1+1 redundant mode.

Referring to Fig. 2, often overlapping in trusted computer subsystem, by expansion medium scale FPGA between TCM and mainboard BIOS, CPU, realizing bus interface conversion between TCM, BIOS and processor system and bus switch controls.Specific as follows:

1) TCM is connected by SPI and FPGA, is converted to after LPC, is connected with BIOS, CPU through FPGA, realizes initiatively tolerance.

2) TCM self-defined bus interface PSRAM is converted to PCI by FPGA.CPU can by pci bus realize to the access of the trusted service of TCM with call.

3) cpu reset signal is linked into FPGA, by the control of TCM.When powering on, TCM first starts, and makes CPU be in reset mode.

4) BIOS is articulated on FPGA by LPC, and FPGA is bi-directionally connected by LPC and CPU.BIOS and CPU isolates by the lpc bus switch of FPGA inside.

5) after TCM starts, first measure BIOS, measure by rear, the lpc bus switch of FPGA inside closes.CPU reads the startup configuration information of BIOS, and system normally starts.

Trusted computer subsystem A and trusted computer subsystem B works simultaneously, but only has a subsystem to participate in service operation simultaneously.Participate in the subsystem of service operation and be called subsystem, be another set ofly in open state but the subsystem not running business is called backup subsystem.Switching between active and standby subsystem can carry out manual triggers by outside change-over switch, or is automatically triggered after subsystem operation exception thinking by Heart-Beat Technology.

Referring to Fig. 3, manual triggers mode realizes failover for operating personnel need manually to press outside change-over switch according to actual conditions.When outside change-over switch disconnects (default conditions), Control end is low level, and internal control signal is connected to GND end by relay, and now switch-over control signal exports as low level; When outside change-over switch closes, Control end is high level, and internal control signal is connected to+5V by relay, and now switch-over control signal exports as high level.Failover module switches according to the outside switch-over control signal interface circuit received.

Referring to Fig. 4, be configured with heartbeat server often overlapping in trusted computer subsystem.Heartbeat server operates in kernel spacing with the form of system-level process, the running status (operating conditions or ossified state) of real-time detection local terminal application and service, and intercept to set one's heart terminal system and jump the heartbeat that sends of server, send the heartbeat of party B to opposite end simultaneously.According to the testing result of local terminal heartbeat and opposite end heartbeat, heartbeat server can wake up or dormancy process related application or service.In order to improve the reliability of heartbeat path, system have employed the dual heartbeat detection path of network interface UDP+COM mouth.

Further, in order to increase the accuracy rate of heartbeat detection, avoid because the excessively busy heartbeat timeout caused of network congestion or system, system no problem is caused to switch, the heartbeat detection mode that heartbeat server adopts PUSH and PULL to combine: adopt PUSH mode to detect heartbeat mutually under normal circumstances, then automatically transferring PULL mode to when intercepting the heartbeat sent less than opposite end, by initiatively inquiring, further detection being done to heartbeat.

In order to ensure by backup subsystem adapter control smoothly, can to ensure by database mirroring engine the duty can understanding mutually both sides each other between subsystem when subsystem breaks down.Subsystem is periodically by various for system important hardware status data and running software data write local data base, database mirroring engine cycle ground by data syn-chronization in backup subsystem, backup subsystem can obtain the various data messages from subsystem by access local database at any time, to realize failover smoothly when subsystem fault.

Active and standby subsystem control is in systems in which not reciprocity, and when subsystem runs business, backup subsystem does not export control, and now subsystem can control system cloud gray model completely.When only having generation failover, backup subsystem just utilizes rapidly the system state data stored to carry out in-situ FTIR spectroelectrochemitry, the control of adapter system, thus guarantee business is run continually and steadily.

Referring to Fig. 5, when subsystem fault, heartbeat server is responsible for by the service that running in subsystem and application recovery to backup subsystem, and rejuvenation is as follows:

1) backup subsystem is according to the list of the service run in the subsystem recorded before fault, starts or wake corresponding service up in backup subsystem, thus realizes the smooth recovery of service.

2) backup subsystem is according to the list of the application run in the subsystem recorded before fault, starts or wake corresponding application up in backup subsystem.Application can according to the checkpoint Information recovering preserved in database before fault to nearest running status.

3) backup subsystem is connected with external interface by failover module in charge.

4) backup subsystem becomes subsystem.

5) subsystem broken down, after fault is repaired, will rejoin as backup subsystem, perform backup machine function.

Referring to Fig. 6, failover module realizes the handoff functionality to trusted computer subsystem A and subsystem B external interface, forms primarily of control module, interface signal commutation circuit A, interface signal commutation circuit B.Commutation circuit adopts Redundancy Design, and have the commutation circuit that two covers are identical, wherein the external interface of subsystem A is connected to commutation circuit A, and the external interface of subsystem B is connected to commutation circuit B.Commutation circuit A is by the Power supply of subsystem A, and commutation circuit B is by the Power supply of subsystem B.Control module receives from the switching command of heartbeat server or external switch signal, realizes interface and switches.Specific as follows:

1) when receiving the switching command from server, control module sends enable control signal, makes the current commutation circuit being in connection status be in vacant state, makes the current commutation circuit being in vacant state be in connection status

2) when receiving the signal from change-over switch, control module sends enable control signal, makes the current commutation circuit being in connection status be in vacant state, makes the current commutation circuit being in vacant state be in connection status.Meanwhile, control module sends handoff notification message to heartbeat server, is completed the switching of upper layer application and service by heartbeat server.

Claims

1. a credible redundancy fault-tolerant computer, is characterized in that: it comprises two cover identical computer subsystem, heartbeat server, failover module, credible password module TCM of redundancy backup each other;

Described credible password module TCM implants in every suit computer subsystem hardware platform, from computer subsystem physical hardware bottom, trust is extended to user application layer face by trust chain mechanism; By setting up plug-in card mode, before setting up system boot, force ID authentication mechanism; The cryptographic service function provided by credible password module TCM, to the protection of process and the sensitive data that stores in addition hardware level; Connected by SPI and FPGA, be converted to after LPC through FPGA, be connected with BIOS, CPU, realize initiatively tolerance; Connected by PSRAM and FPGA, be converted to after PCI, be connected with CPU through FPGA, CPU realizes the access of the trusted service of credible password module TCM by pci bus and calls;

After described credible password module TCM starts, first measure BIOS, measure by rear, the lpc bus switch of FPGA inside closes, and CPU reads the startup configuration information of BIOS, and system normally starts;

Described failover module receives switching command from described heartbeat server or external switch signal, realizes the switching to the identical computer subsystem of two covers; When receiving the switching command from described heartbeat server, described failover module sends enable control signal, makes the commutation circuit being in connection status be in vacant state or makes the commutation circuit being in vacant state be in connection status; When receiving the signal from change-over switch, described failover module sends enable control signal, make the current commutation circuit being in connection status be in vacant state or make the current commutation circuit being in vacant state be in connection status, simultaneously, described failover module sends handoff notification message to heartbeat server, is completed the switching of upper layer application and service by heartbeat server;

Ensure by database mirroring engine the duty can understanding mutually both sides each other between the identical computer subsystem of described two cover, subsystem is periodically by various for system important hardware status data and running software data write local data base, database mirroring engine cycle ground by data syn-chronization in backup subsystem, backup subsystem obtains the various data messages from subsystem by access local database at any time, to realize failover smoothly when subsystem fault;

After described failover occurs, service and application will continue to run at another machine, and application program can according to the checkpoint Information recovering preserved in database to nearest running status.