CN112084135A - High-reliability computer based on domestic processor - Google Patents

High-reliability computer based on domestic processor Download PDF

Info

Publication number
CN112084135A
CN112084135A CN202010984124.3A CN202010984124A CN112084135A CN 112084135 A CN112084135 A CN 112084135A CN 202010984124 A CN202010984124 A CN 202010984124A CN 112084135 A CN112084135 A CN 112084135A
Authority
CN
China
Prior art keywords
switching
computing unit
computer
module
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010984124.3A
Other languages
Chinese (zh)
Inventor
杨坤龙
耿士华
沈忱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Chaoyue Shentai Information Technology Co Ltd
Original Assignee
Xian Chaoyue Shentai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Chaoyue Shentai Information Technology Co Ltd filed Critical Xian Chaoyue Shentai Information Technology Co Ltd
Priority to CN202010984124.3A priority Critical patent/CN112084135A/en
Publication of CN112084135A publication Critical patent/CN112084135A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention provides a high-reliability computer based on a domestic processor, which belongs to the technical field of domestic high-reliability reinforced computers, and is characterized in that more than 2 computing units and 1 switching unit are arranged in the computer, and each computing unit is configured with high-availability cluster management software; the switching unit realizes the switching of the external interface. The stability of the computer can be greatly improved.

Description

High-reliability computer based on domestic processor
Technical Field
The invention relates to the field of domestic high-reliability ruggedized computers, in particular to a domestic processor-based high-reliability computer.
Background
In a special industry, the stability of a computer is of vital importance, in fact, computer hardware and software inevitably have faults, and according to investigation, hardware faults of a CPU, a memory, a disk and the like of a system, software faults of an operating system, an application program of a client and the like and manual misoperation which are frequently encountered in actual work are found. If a fault occurs in a particular situation, the consequences are immeasurable.
In most production service systems at present, service systems are deployed in a single machine environment, and each system is responsible for supporting daily important services of users, so that a single-point fault risk exists.
Disclosure of Invention
In order to solve the technical problems, the invention provides a high-reliability computer based on a domestic processor.
The technical scheme of the invention is as follows:
a high-reliability computer based on a domestic processor,
more than 2 computing units and 1 switching unit are arranged in a computer, and each computing unit is configured with high-availability cluster management software; the switching unit realizes the switching of the external interface.
Further, in the above-mentioned case,
the computing units are deployed with high-availability cluster management software, and application data are synchronized to the backup computing unit in real time by adopting a data-based real-time copying mode through two groups of gigabit network heartbeat signals between the computing units, so that an integral application protection solution is provided for users.
When a fault is detected, the service system is automatically switched to the backup computing unit, and simultaneously, the CPU GPIO is driven and controlled under the system to send a switching command to the switching unit, and the switching unit is matched with the switching unit to complete the switching control of the interface.
The switching unit is provided with two parts of circuits, one part is a switching control circuit, and the other part is an interface switching circuit.
Further, in the above-mentioned case,
the switching control circuit receives the heartbeat signal of the computing unit through the GPIO to judge the state of the computing unit and controls the switching circuit to realize the switching of the external interface when the computing unit fails.
Further, in the above-mentioned case,
the interface switching circuit completes the switching of the external interface according to the instruction of the switching control circuit; when a signal requesting for switching is received, the switching function of the computing unit is quickly completed, and the external interface is ensured to be always kept in place in the main computing unit.
Further, in the above-mentioned case,
the computing unit adopts an LS3A3000+7A1000 structure and outputs network, USB, serial ports and video signals to the outside.
Further, in the above-mentioned case,
the high-availability software consists of four software parts, including a driving module, an agent module, a service module and a management module;
wherein the content of the first and second substances,
the driving module is embedded into a kernel file of the system, plays a role in monitoring the system in real time, filters useful data, transmits the useful data to the service module for backup, and informs the switching module to switch an external interface by controlling a CPU GPIO signal, so that the use of a user side is always on the main computing unit;
the agent module monitors the client application and transmits information to the service module;
the service module is responsible for executing commands from users/software, transmitting application and user data, ensuring that source data is consistent with backup data, processing faults and the like, and must be executed by matching with a driver;
the management module provides an interface type management platform, can check the working state of the computing unit, hardware information and the like in real time, and records the backup log.
The invention has the advantages that
The computing units are configured with high-availability cluster management software, and are matched with the switching unit to realize the redundant hot standby of the two computing units, so that the stability of the computer can be greatly improved.
Drawings
FIG. 1 is a schematic workflow diagram of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.
On the basis of the existing computer, a corresponding physical computer is newly added, and a dual-computer high-availability cluster solution is deployed so as to ensure the real-time backup of the core data of the database and the continuous operation of the service.
Highly reliable ruggedized computers employ redundant computing units to ensure that computer information systems provide uninterrupted service to maintain availability of the system.
The invention provides a high-reliability computer based on a domestic processor, which needs the computer at least internally comprising 2 computing units and 1 switching unit, wherein each computing unit is configured with high-availability cluster management software and is matched with the switching unit to realize the redundant hot standby of the two computing units, so that the stability of the computer can be greatly improved.
The computing units are deployed with high-availability cluster management software, and application data are synchronized to the backup computing unit in real time by adopting a data-based real-time copying mode through two groups of gigabit network heartbeat signals between the computing units, so that an integral application protection solution is provided for a user;
when faults of hardware, software, OS, network and the like are detected, the service system is automatically switched to the backup computing unit, meanwhile, a switching command is sent to the switching unit through a driving control CPU GPIO under the system, and the switching unit is matched with the switching unit to complete switching control of the interface.
The switching unit realizes the switching of the video interface, the USB interface, the serial port and other external interfaces, and the switching function board card is provided with two parts of circuits, wherein one part is a switching control circuit and the other part is an interface switching circuit;
wherein
The switching control circuit has the main functions of judging the state of the computing unit by receiving the heartbeat signal of the computing unit through the GPIO and controlling the switching circuit to realize the switching of an external interface when the computing unit fails;
the interface switching circuit has the main function of completing the switching of the external interface according to the instruction of the switching control circuit. When a signal requesting for switching is received, the switching function of the computing unit is quickly completed, and the external interface is ensured to be always kept in place in the main computing unit.
The computing unit adopts an LS3A3000+7A1000 structure, and for an external output network, a USB, a serial port and a video signal, the switching unit uses a single chip microcomputer and a switching chip to perform interface switching.
The high-availability software consists of four software parts including a driving module, an agent module, a service module and a management module.
Wherein the content of the first and second substances,
the driving module is embedded into a kernel file of the system, plays a role in monitoring the system in real time, filters useful data, transmits the useful data to the service module for backup, and informs the switching module to switch an external interface by controlling a CPU GPIO signal, so that the use of a user side is always on the main computing unit;
the agent module monitors the client application and transmits information to the service module;
the service module is responsible for executing commands from users/software, transmitting application and user data, ensuring that source data is consistent with backup data, processing faults and the like, and must be executed by matching with a driver;
the management module provides an interface type management platform, can check the working state of the computing unit, hardware information and the like in real time, and records the backup log.
The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (8)

1. A high-reliability computer based on a domestic processor is characterized in that,
more than 2 computing units and 1 switching unit are arranged in a computer, and each computing unit is configured with high-availability cluster management software; the switching unit realizes the switching of the external interface.
2. The computer of claim 1,
the computing units are deployed with high-availability cluster management software, and application data are synchronized to the backup computing unit in real time by adopting a data-based real-time copying mode through two groups of gigabit network heartbeat signals between the computing units, so that an integral application protection solution is provided for users.
3. The computer of claim 2,
when a fault is detected, the service system is automatically switched to the backup computing unit, and simultaneously, the CPU GPIO is driven and controlled under the system to send a switching command to the switching unit, and the switching unit is matched with the switching unit to complete the switching control of the interface.
4. The computer of claim 1,
the switching unit is provided with two parts of circuits, one part is a switching control circuit, and the other part is an interface switching circuit.
5. The computer of claim 4,
the switching control circuit receives the heartbeat signal of the computing unit through the GPIO to judge the state of the computing unit and controls the switching circuit to realize the switching of the external interface when the computing unit fails.
6. The computer of claim 4,
the interface switching circuit completes the switching of the external interface according to the instruction of the switching control circuit; when a signal requesting for switching is received, the switching function of the computing unit is quickly completed, and the external interface is ensured to be always kept in place in the main computing unit.
7. The computer of claim 2 or 3,
the computing unit adopts an LS3A3000+7A1000 structure and outputs network, USB, serial ports and video signals to the outside.
8. The computer of claim 1,
the high-availability software consists of four software parts, including a driving module, an agent module, a service module and a management module;
wherein the content of the first and second substances,
the driving module is embedded into a kernel file of the system, plays a role in monitoring the system in real time, filters useful data, transmits the useful data to the service module for backup, and informs the switching module to switch an external interface by controlling a CPU GPIO signal, so that the use of a user side is always on the main computing unit;
the agent module monitors the client application and transmits information to the service module;
the service module is responsible for executing commands from users/software, transmitting application and user data, ensuring that source data is consistent with backup data, processing faults and the like, and must be executed by matching with a driver;
the management module provides an interface type management platform, can check the working state of the computing unit, hardware information and the like in real time, and records the backup log.
CN202010984124.3A 2020-09-18 2020-09-18 High-reliability computer based on domestic processor Pending CN112084135A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010984124.3A CN112084135A (en) 2020-09-18 2020-09-18 High-reliability computer based on domestic processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010984124.3A CN112084135A (en) 2020-09-18 2020-09-18 High-reliability computer based on domestic processor

Publications (1)

Publication Number Publication Date
CN112084135A true CN112084135A (en) 2020-12-15

Family

ID=73736579

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010984124.3A Pending CN112084135A (en) 2020-09-18 2020-09-18 High-reliability computer based on domestic processor

Country Status (1)

Country Link
CN (1) CN112084135A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708028A (en) * 2012-05-18 2012-10-03 中国人民解放军第二炮兵装备研究院第四研究所 Trusted redundant fault-tolerant computer system
CN103106126A (en) * 2013-01-16 2013-05-15 浪潮电子信息产业股份有限公司 High-availability computer system based on virtualization
US20160283335A1 (en) * 2015-03-24 2016-09-29 Xinyu Xingbang Information Industry Co., Ltd. Method and system for achieving a high availability and high performance database cluster
CN106815093A (en) * 2015-11-30 2017-06-09 北京宇航系统工程研究所 A kind of computer glitch fault tolerance facility based on interconnection between domestic Loongson processor
CN110784350A (en) * 2019-10-25 2020-02-11 北京计算机技术及应用研究所 Design method of real-time available cluster management system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708028A (en) * 2012-05-18 2012-10-03 中国人民解放军第二炮兵装备研究院第四研究所 Trusted redundant fault-tolerant computer system
CN103106126A (en) * 2013-01-16 2013-05-15 浪潮电子信息产业股份有限公司 High-availability computer system based on virtualization
US20160283335A1 (en) * 2015-03-24 2016-09-29 Xinyu Xingbang Information Industry Co., Ltd. Method and system for achieving a high availability and high performance database cluster
CN106815093A (en) * 2015-11-30 2017-06-09 北京宇航系统工程研究所 A kind of computer glitch fault tolerance facility based on interconnection between domestic Loongson processor
CN110784350A (en) * 2019-10-25 2020-02-11 北京计算机技术及应用研究所 Design method of real-time available cluster management system

Similar Documents

Publication Publication Date Title
US11681566B2 (en) Load balancing and fault tolerant service in a distributed data system
US20190303255A1 (en) Cluster availability management
US7028218B2 (en) Redundant multi-processor and logical processor configuration for a file server
US7617360B2 (en) Disk array apparatus and method of controlling the same by a disk array controller having a plurality of processor cores
US9823955B2 (en) Storage system which is capable of processing file access requests and block access requests, and which can manage failures in A and storage system failure management method having a cluster configuration
US20160147540A1 (en) Server system
US20140122816A1 (en) Switching between mirrored volumes
TW200703022A (en) Internet SCSI communication via UNDI services
US11573737B2 (en) Method and apparatus for performing disk management of all flash array server
CN104102559A (en) Redundant heartbeat link and opposite-end restarting link based double-controller storage system
CN111767244A (en) Dual-redundancy computer equipment based on domestic Loongson platform
US8099634B2 (en) Autonomic component service state management for a multiple function component
US7797394B2 (en) System and method for processing commands in a storage enclosure
CN109117342A (en) A kind of server and its hard disk health status monitoring system
CN210295047U (en) Optical fiber KVM system with double backup functions
US20040059862A1 (en) Method and apparatus for providing redundant bus control
CN113342261A (en) Server and control method applied to same
US20070294600A1 (en) Method of detecting heartbeats and device thereof
CN212541329U (en) Dual-redundancy computer equipment based on domestic Loongson platform
CN110740066B (en) Seat-invariant cross-machine fault migration method and system
CN112084135A (en) High-reliability computer based on domestic processor
US11210034B2 (en) Method and apparatus for performing high availability management of all flash array server
CN113535471A (en) Cluster server
US10664429B2 (en) Systems and methods for managing serial attached small computer system interface (SAS) traffic with storage monitoring
CN110752955A (en) Seat invariant fault migration system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination