CN102708028B - Trusted redundant fault-tolerant computer system - Google Patents

Trusted redundant fault-tolerant computer system Download PDF

Info

Publication number
CN102708028B
CN102708028B CN201210154659.3A CN201210154659A CN102708028B CN 102708028 B CN102708028 B CN 102708028B CN 201210154659 A CN201210154659 A CN 201210154659A CN 102708028 B CN102708028 B CN 102708028B
Authority
CN
China
Prior art keywords
subsystem
trusted
module
computer
tcm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210154659.3A
Other languages
Chinese (zh)
Other versions
CN102708028A (en
Inventor
杨明华
慈林林
陈晓峰
葛根焰
郑建群
杨银刚
杨斌
黄亮
何水发
施鸿程
陈强
李轩涯
程宾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
FOURTH RESEARCH INSTITUTE OF SECOND ARTILLERY EQUIPMENT ACADEMY OF PLA
Original Assignee
FOURTH RESEARCH INSTITUTE OF SECOND ARTILLERY EQUIPMENT ACADEMY OF PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by FOURTH RESEARCH INSTITUTE OF SECOND ARTILLERY EQUIPMENT ACADEMY OF PLA filed Critical FOURTH RESEARCH INSTITUTE OF SECOND ARTILLERY EQUIPMENT ACADEMY OF PLA
Priority to CN201210154659.3A priority Critical patent/CN102708028B/en
Publication of CN102708028A publication Critical patent/CN102708028A/en
Application granted granted Critical
Publication of CN102708028B publication Critical patent/CN102708028B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Hardware Redundancy (AREA)

Abstract

The invention provides a trusted redundant fault-tolerant computer system which aims at satisfying the requirement for high safety and high reliability of systems in safety control fields. The trusted redundant fault-tolerant computer system is capable of blocking the operation of illegal programs of malicious codes, viruses and the like, protecting the system and core applications from being destroyed, protecting important information from being revealed, stolen, tampered and ruined, and shielding faults by means of a failure switching function to enable the system to work normally when faults of the system occur. The trusted redundant fault-tolerant computer system is based on a trusted cryptography module (TCM) safety chip, and a dual-computer redundant hot standby method and a compact peripheral component interconnect (CPCI) bus framework are used. Two trusted computer subsystems are configured in a computer case, each of the trusted computer subsystems is composed of a trusted computer main module (including a TCM and a flash disk), a power source module, a flash disk expansion module and an interface expansion module, and the failure switching between the two subsystems is achieved through a heartbeat server and a failure switching module.

Description

A kind of credible redundant fault-tolerant computer system
Technical field
The present invention relates to a kind of high credible, highly reliable computer system, belong to safety control technology field.
Background technology
Along with the fast development of computer technology, in fields such as Aeronautics and Astronautics, military affairs and Industry Control, and some have the key area of " 7 × 24 " uninterrupted operation demand, and a large amount of computing machine that uses is as system control equipment.Along with the scale of engineering system and complicacy increase day by day, this type systematic just likely causes the massive losses of personnel and property once there is insincere, unreliable problem.Such as, the continuous nuclear safety accident occurred of Iran in 2011, and American UAV ground control system is by malicious code intrusion event.
Secure and trusted, be lastingly reliably the requirement of user to computer control system.Due to the introducing of reliable computing technology, the requirement of system in reliability is made also to become more urgent, especially important to some memory device.For this reason, reliable computing technology combines with fault-tolerant technique by the present invention, can improve reliability and the security of sophisticated computers control system on certain procedures.
Summary of the invention
The object of the present invention is to provide a kind of credible redundant fault-tolerant computer system, by adding TCM root of trust module often overlapping in subsystem, the software and hardware defencive function of computing machine being provided, guaranteeing the secure and trusted of computing machine; Carry out System Fault Tolerance by Redundant backup technology, improve the mission reliability of machine system.
For reaching above object, the present invention implants TCM module construction root of trust in the hardware platform of every suit subsystem, and from physical hardware bottom, trust is extended to user application layer face, for user provides believable execution environment guarantee by trust chain mechanism; , adopt plug-in card start-up mode to set up system boot when computer booting starts before, force ID authentication mechanism, in case locking system is falsely used by stranger; The cryptographic service function utilizing TCM to provide, to the protection of process and the sensitive data that stores in addition hardware level, prevents malicious user to the destruction of confidential data and steals.
In reliability, by configuring two cover trusted computer subsystem composition hot backup systems in a cabinet, realize redundancy fault-tolerant.Two cover trusted computer subsystems in cabinet, composition A, B machine system, has heartbeat detection and data synchronization mechanism between two cover systems.Only have a machine to participate in business online at ordinary times, taken over job by another machine when a machine fault.Be responsible for finding fault and performing failover by heartbeat server.After failover, service and application will continue to run at another machine, and application program can according to the checkpoint Information recovering preserved in database to nearest running status.Complete interface circuit by failover module to switch.
Accompanying drawing explanation
Fig. 1 is the connection diagram of credible redundant fault-tolerant computer system of the present invention;
Fig. 2 is the adaptation figure between trusted computer primary module of the present invention and TCM module;
Fig. 3 is the fundamental diagram of credible redundant fault-tolerant computer system of the present invention;
Fig. 4 is control switching circuit schematic diagram of the present invention;
Fig. 5 is the failover schematic flow sheet of credible redundant fault-tolerant computer system of the present invention;
Fig. 6 is failover module work schematic diagram of the present invention;
Embodiment
Referring to Fig. 1, native system have employed two redundancy/bis-Active/ high-availability cluster mode, adopts CPCI framework, power acquisition 1+1 redundant mode.
Referring to Fig. 2, often overlapping in trusted computer subsystem, by expansion medium scale FPGA between TCM and mainboard BIOS, CPU, realizing bus interface conversion between TCM, BIOS and processor system and bus switch controls.Specific as follows:
1) TCM is connected by SPI and FPGA, is converted to after LPC, is connected with BIOS, CPU through FPGA, realizes initiatively tolerance.
2) TCM self-defined bus interface PSRAM is converted to PCI by FPGA.CPU can by pci bus realize to the access of the trusted service of TCM with call.
3) cpu reset signal is linked into FPGA, by the control of TCM.When powering on, TCM first starts, and makes CPU be in reset mode.
4) BIOS is articulated on FPGA by LPC, and FPGA is bi-directionally connected by LPC and CPU.BIOS and CPU isolates by the lpc bus switch of FPGA inside.
5) after TCM starts, first measure BIOS, measure by rear, the lpc bus switch of FPGA inside closes.CPU reads the startup configuration information of BIOS, and system normally starts.
Trusted computer subsystem A and trusted computer subsystem B works simultaneously, but only has a subsystem to participate in service operation simultaneously.Participate in the subsystem of service operation and be called subsystem, be another set ofly in open state but the subsystem not running business is called backup subsystem.Switching between active and standby subsystem can carry out manual triggers by outside change-over switch, or is automatically triggered after subsystem operation exception thinking by Heart-Beat Technology.
Referring to Fig. 3, manual triggers mode realizes failover for operating personnel need manually to press outside change-over switch according to actual conditions.When outside change-over switch disconnects (default conditions), Control end is low level, and internal control signal is connected to GND end by relay, and now switch-over control signal exports as low level; When outside change-over switch closes, Control end is high level, and internal control signal is connected to+5V by relay, and now switch-over control signal exports as high level.Failover module switches according to the outside switch-over control signal interface circuit received.
Referring to Fig. 4, be configured with heartbeat server often overlapping in trusted computer subsystem.Heartbeat server operates in kernel spacing with the form of system-level process, the running status (operating conditions or ossified state) of real-time detection local terminal application and service, and intercept to set one's heart terminal system and jump the heartbeat that sends of server, send the heartbeat of party B to opposite end simultaneously.According to the testing result of local terminal heartbeat and opposite end heartbeat, heartbeat server can wake up or dormancy process related application or service.In order to improve the reliability of heartbeat path, system have employed the dual heartbeat detection path of network interface UDP+COM mouth.
Further, in order to increase the accuracy rate of heartbeat detection, avoid because the excessively busy heartbeat timeout caused of network congestion or system, system no problem is caused to switch, the heartbeat detection mode that heartbeat server adopts PUSH and PULL to combine: adopt PUSH mode to detect heartbeat mutually under normal circumstances, then automatically transferring PULL mode to when intercepting the heartbeat sent less than opposite end, by initiatively inquiring, further detection being done to heartbeat.
In order to ensure by backup subsystem adapter control smoothly, can to ensure by database mirroring engine the duty can understanding mutually both sides each other between subsystem when subsystem breaks down.Subsystem is periodically by various for system important hardware status data and running software data write local data base, database mirroring engine cycle ground by data syn-chronization in backup subsystem, backup subsystem can obtain the various data messages from subsystem by access local database at any time, to realize failover smoothly when subsystem fault.
Active and standby subsystem control is in systems in which not reciprocity, and when subsystem runs business, backup subsystem does not export control, and now subsystem can control system cloud gray model completely.When only having generation failover, backup subsystem just utilizes rapidly the system state data stored to carry out in-situ FTIR spectroelectrochemitry, the control of adapter system, thus guarantee business is run continually and steadily.
Referring to Fig. 5, when subsystem fault, heartbeat server is responsible for by the service that running in subsystem and application recovery to backup subsystem, and rejuvenation is as follows:
1) backup subsystem is according to the list of the service run in the subsystem recorded before fault, starts or wake corresponding service up in backup subsystem, thus realizes the smooth recovery of service.
2) backup subsystem is according to the list of the application run in the subsystem recorded before fault, starts or wake corresponding application up in backup subsystem.Application can according to the checkpoint Information recovering preserved in database before fault to nearest running status.
3) backup subsystem is connected with external interface by failover module in charge.
4) backup subsystem becomes subsystem.
5) subsystem broken down, after fault is repaired, will rejoin as backup subsystem, perform backup machine function.
Referring to Fig. 6, failover module realizes the handoff functionality to trusted computer subsystem A and subsystem B external interface, forms primarily of control module, interface signal commutation circuit A, interface signal commutation circuit B.Commutation circuit adopts Redundancy Design, and have the commutation circuit that two covers are identical, wherein the external interface of subsystem A is connected to commutation circuit A, and the external interface of subsystem B is connected to commutation circuit B.Commutation circuit A is by the Power supply of subsystem A, and commutation circuit B is by the Power supply of subsystem B.Control module receives from the switching command of heartbeat server or external switch signal, realizes interface and switches.Specific as follows:
1) when receiving the switching command from server, control module sends enable control signal, makes the current commutation circuit being in connection status be in vacant state, makes the current commutation circuit being in vacant state be in connection status
2) when receiving the signal from change-over switch, control module sends enable control signal, makes the current commutation circuit being in connection status be in vacant state, makes the current commutation circuit being in vacant state be in connection status.Meanwhile, control module sends handoff notification message to heartbeat server, is completed the switching of upper layer application and service by heartbeat server.

Claims (1)

1. a credible redundancy fault-tolerant computer, is characterized in that: it comprises two cover identical computer subsystem, heartbeat server, failover module, credible password module TCM of redundancy backup each other;
Described credible password module TCM implants in every suit computer subsystem hardware platform, from computer subsystem physical hardware bottom, trust is extended to user application layer face by trust chain mechanism; By setting up plug-in card mode, before setting up system boot, force ID authentication mechanism; The cryptographic service function provided by credible password module TCM, to the protection of process and the sensitive data that stores in addition hardware level; Connected by SPI and FPGA, be converted to after LPC through FPGA, be connected with BIOS, CPU, realize initiatively tolerance; Connected by PSRAM and FPGA, be converted to after PCI, be connected with CPU through FPGA, CPU realizes the access of the trusted service of credible password module TCM by pci bus and calls;
After described credible password module TCM starts, first measure BIOS, measure by rear, the lpc bus switch of FPGA inside closes, and CPU reads the startup configuration information of BIOS, and system normally starts;
Described failover module receives switching command from described heartbeat server or external switch signal, realizes the switching to the identical computer subsystem of two covers; When receiving the switching command from described heartbeat server, described failover module sends enable control signal, makes the commutation circuit being in connection status be in vacant state or makes the commutation circuit being in vacant state be in connection status; When receiving the signal from change-over switch, described failover module sends enable control signal, make the current commutation circuit being in connection status be in vacant state or make the current commutation circuit being in vacant state be in connection status, simultaneously, described failover module sends handoff notification message to heartbeat server, is completed the switching of upper layer application and service by heartbeat server;
Ensure by database mirroring engine the duty can understanding mutually both sides each other between the identical computer subsystem of described two cover, subsystem is periodically by various for system important hardware status data and running software data write local data base, database mirroring engine cycle ground by data syn-chronization in backup subsystem, backup subsystem obtains the various data messages from subsystem by access local database at any time, to realize failover smoothly when subsystem fault;
After described failover occurs, service and application will continue to run at another machine, and application program can according to the checkpoint Information recovering preserved in database to nearest running status.
CN201210154659.3A 2012-05-18 2012-05-18 Trusted redundant fault-tolerant computer system Active CN102708028B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210154659.3A CN102708028B (en) 2012-05-18 2012-05-18 Trusted redundant fault-tolerant computer system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210154659.3A CN102708028B (en) 2012-05-18 2012-05-18 Trusted redundant fault-tolerant computer system

Publications (2)

Publication Number Publication Date
CN102708028A CN102708028A (en) 2012-10-03
CN102708028B true CN102708028B (en) 2015-01-07

Family

ID=46900836

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210154659.3A Active CN102708028B (en) 2012-05-18 2012-05-18 Trusted redundant fault-tolerant computer system

Country Status (1)

Country Link
CN (1) CN102708028B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103309713A (en) * 2013-06-25 2013-09-18 北京小米科技有限责任公司 System upgrading method, device and equipment
CN104199517A (en) * 2014-09-03 2014-12-10 山东超越数控电子有限公司 Heterogeneous trusted redundant server system based on domestic processor
CN107844449B (en) * 2016-09-20 2021-02-09 深圳中电长城信息安全系统有限公司 Method and system for processing communication protocol by Feiteng platform
CN108268286A (en) * 2016-12-29 2018-07-10 联想(上海)信息技术有限公司 Computer system starting method and computer system
CN106527409B (en) * 2016-12-29 2019-01-29 中车株洲电力机车研究所有限公司 A kind of master control cabinet
CN106991329A (en) * 2017-03-31 2017-07-28 山东超越数控电子有限公司 A kind of trust calculation unit and its operation method based on domestic TCM
CN108594635B (en) * 2018-04-13 2021-06-29 成都赫尔墨斯科技股份有限公司 Device and method for data comprehensive display control in avionics system
CN110750794B (en) * 2019-10-24 2022-03-22 长城信息股份有限公司 BIOS (basic input output System) safe starting method and system
CN112084135A (en) * 2020-09-18 2020-12-15 西安超越申泰信息科技有限公司 High-reliability computer based on domestic processor
CN114579983B (en) * 2022-04-26 2022-09-09 阿里云计算有限公司 Method and device for acquiring trusted information and trusted server

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101281483A (en) * 2008-05-12 2008-10-08 北京邮电大学 Double-machine redundant tolerant system and redundant switching method thereof
CN101281577A (en) * 2008-05-16 2008-10-08 北京工业大学 Dependable computing system capable of protecting BIOS and method of use thereof
CN201820230U (en) * 2010-01-22 2011-05-04 华北计算技术研究所 Computer and trusted-computing trusted root equipment for same

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8046660B2 (en) * 2006-08-07 2011-10-25 Marvell World Trade Ltd. System and method for correcting errors in non-volatile memory using product codes

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101281483A (en) * 2008-05-12 2008-10-08 北京邮电大学 Double-machine redundant tolerant system and redundant switching method thereof
CN101281577A (en) * 2008-05-16 2008-10-08 北京工业大学 Dependable computing system capable of protecting BIOS and method of use thereof
CN201820230U (en) * 2010-01-22 2011-05-04 华北计算技术研究所 Computer and trusted-computing trusted root equipment for same

Also Published As

Publication number Publication date
CN102708028A (en) 2012-10-03

Similar Documents

Publication Publication Date Title
CN102708028B (en) Trusted redundant fault-tolerant computer system
US8965749B2 (en) Demand based USB proxy for data stores in service processor complex
Sousa et al. Highly available intrusion-tolerant services with proactive-reactive recovery
CN107506663A (en) Server security based on credible BMC starts method
CN101542444B (en) Security features in interconnect centric architectures
CN110690985A (en) Network function virtualization architecture with device isolation
CN106850260A (en) A kind of dispositions method and device of virtual resources management platform
CN208210006U (en) A kind of high safety trusted servers based on domestic TPM
Sousa et al. Proactive resilience through architectural hybridization
CN104794395A (en) Architecture characteristic based lightweight multi-system safety management structure
CN110069361A (en) Method and device for TPM (trusted platform Module) failover
CN101303716B (en) Embedded system recuperation mechanism based on TPM
Jha et al. Resiliency of hpc interconnects: A case study of interconnect failures and recovery in blue waters
CN109445909A (en) Backup method, system, terminal and the storage medium of virtual-machine data
US20140006854A1 (en) Resolution of System Hang due to Filesystem Corruption
CN115296819A (en) Data security backup method and device
Dayabhai et al. Substation automation solution that uses virtualization to reduce cost while ensuring redundancy and security compliance
WO2021229137A1 (en) System design model
Sun et al. High availability analysis and evaluation of heterogeneous dual computer fault-tolerant system
Lee et al. NCU-HA: A lightweight HA system for kernel-based virtual machine
CN102917015A (en) Method and device for virtualizing tolerance intrusion based on cloud computing
Zhuo et al. Machine fault tolerance for reliable datacenter systems
Liu et al. G-cloud: a highly reliable and secure IaaS platform
EP3326069A1 (en) Preserving volatile memory across a computer system disruption
Zhao Towards practical intrusion tolerant systems: a blueprint

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee
CP02 Change in the address of a patent holder

Address after: Seven, Room 204, unit 109, 100094 North Qing Lu, Beijing, Haidian District

Patentee after: The Fourth Research Institute of the Second Artillery Equipment Academy of PLA

Address before: 100085 Qinghe building, Haidian District, Beijing nine

Patentee before: The Fourth Research Institute of the Second Artillery Equipment Academy of PLA

CP03 Change of name, title or address

Address after: Room 6, unit 25, 109 Beiqing Road, Haidian District, Beijing 100094

Patentee after: Research Institute of penetration and defense, rocket Army Research Institute, PLA

Address before: Seven, Room 204, unit 109, 100094 North Qing Lu, Beijing, Haidian District

Patentee before: FOURTH INSTITUTE OF THE SECOND ARTILLERY EQUIPMENT ACADEMY, PLA

CP03 Change of name, title or address