CN113127270B - Cloud computing-based 3-acquisition-2 secure computer platform - Google Patents

Cloud computing-based 3-acquisition-2 secure computer platform Download PDF

Info

Publication number
CN113127270B
CN113127270B CN202110355059.2A CN202110355059A CN113127270B CN 113127270 B CN113127270 B CN 113127270B CN 202110355059 A CN202110355059 A CN 202110355059A CN 113127270 B CN113127270 B CN 113127270B
Authority
CN
China
Prior art keywords
host
data
synchronization
mode
computer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110355059.2A
Other languages
Chinese (zh)
Other versions
CN113127270A (en
Inventor
唐涛
朱力
李松
王悉
王洪伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN202110355059.2A priority Critical patent/CN113127270B/en
Publication of CN113127270A publication Critical patent/CN113127270A/en
Application granted granted Critical
Publication of CN113127270B publication Critical patent/CN113127270B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/203Failover techniques using migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1629Error detection by comparing the output of redundant processing systems
    • G06F11/1633Error detection by comparing the output of redundant processing systems using mutual exchange of the output between the redundant processing components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1629Error detection by comparing the output of redundant processing systems
    • G06F11/165Error detection by comparing the output of redundant processing systems with continued operation after detection of the error
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1675Temporal synchronisation or re-synchronisation of redundant processing components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2028Failover techniques eliminating a faulty processor or activating a spare
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2035Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant without idle spare hardware

Abstract

The invention provides a 3-out-of-2 secure computer platform based on cloud computing. Comprising the following steps: hierarchical structure from top to bottom: cloud management center, service node, secure computer virtualization container and physical infrastructure; the cloud management center is one, the service node is a host, the cloud management center respectively carries out signaling and data communication with three hosts, the host corresponds to the secure computer virtualization container one by one with the physical infrastructure, the host carries out data communication with the corresponding secure computer virtualization container, and the secure computer virtualization container carries out data communication with the corresponding physical infrastructure. The application and the running environment are packaged in a container, so that the application and the running environment are light in weight and easy to migrate and deploy; the distributed cloud management center realizes real-time monitoring and resource scheduling of the lower physical service nodes, platform self-diagnosis, immediate recovery of faults and inheritance of historical variables and state data; the platform provides the basic functions of the 3-out-of-2 secure computer, and can also develop peripheral applications.

Description

Cloud computing-based 3-acquisition-2 secure computer platform
Technical Field
The invention relates to the technical field of secure computers, in particular to a 3-taking-2 secure computer platform based on cloud computing.
Background
The security computer technology is involved in the fields of rail transit, aerospace and the like. The technology is used for guaranteeing the correctness of the input, output and intermediate states of equipment or application, and a multimode redundancy mode is adopted in most cases.
In the field of rail transit, ground equipment and vehicle-mounted equipment are both composed of safety computers. In case of an emergency failure of a device due to physical reasons or other reasons, another set of systems or emergency treatment schemes needs to be designed to record the failure state in time and restore the safety of the device, i.e. the failure safety principle must be followed: the system state can be guided to safety during failure.
In architecture design, the internal architecture of the secure computer platform generally adopts a two-channel structure (2 by 2 get 2) or a multi-channel structure (3 get 2), and multiple channels monitor each other and vote on respective inputs and outputs to determine the normal or abnormal of each channel. The architecture mainly comprises a data communication module, a synchronization module among channels and an input-output three-out-two voting module.
At present, the software and hardware 3-out-of-2 secure computer platform in the prior art has the following defects:
1) The number of boards or hosts increases due to the redundancy design concept.
SIEMENS in Germany is based on SICAS system of three get two structure and Thales company in France is also based on SelTrac system of three get two structure, all contain the safety computer based on redundancy design theory, and the design mode of multichannel redundancy will inevitably cause the increase of the number of boards or hosts in multiple, resulting in the whole set of safety computer equipment to occupy enough one or more cabinets.
2) The board card is bound with the software, and the failure of the hardware and the software can lead to the functional failure of the safe computer.
The general three-in-one safety computer hardware mainly comprises a CPU processor module, a memory module, a power module, a peripheral circuit and other modules. Physical failure of each module increases the probability of failure of the secure computer function.
3) Maintenance changes cause interruption of the application service.
The secure computer platform hardware has a certain average failure time, i.e. a limited lifetime. Once the equipment fails or the hardware ages, the time required for maintaining the update tends to cause a part of the functions of the security computer to fail, thereby causing the interruption of the application service.
Disclosure of Invention
The embodiment of the invention provides a 3-out-of-2 secure computer platform based on cloud computing, which aims to overcome the problems in the prior art.
In order to achieve the above purpose, the present invention adopts the following technical scheme.
A 3-fetch 2 secure computer platform based on cloud computing, comprising: the cloud management center, the service nodes, the secure computer virtualization container and the physical infrastructure are layered architecture, and the cloud management center, the service nodes, the secure computer virtualization container and the physical infrastructure are sequentially arranged from top to bottom; the cloud management center is one, the service node is a host, the cloud management center is in signaling and data communication with three hosts respectively, the hosts are in one-to-one correspondence with the secure computer virtualization containers, the secure computer virtualization containers are in one-to-one correspondence with the physical infrastructures, the hosts are in data communication with the corresponding secure computer virtualization containers, and the secure computer virtualization containers are in data communication with the corresponding physical infrastructures.
Preferably, the three host structures independently operate, loose coupling redundant structures are achieved based on task-level synchronization, and data exchange is carried out through a virtual network technology; a voting mechanism of taking 2 from 3 is adopted among the three host structures, and only the host in the main mode can send information to other external devices.
Preferably, the cloud management center is of a distributed structure, so that the cloud management center can accept geographical disaster and defend single-point faults, and monitoring of service nodes and user application processes is not interrupted; after the communication link between any two hosts is interrupted, the data forwarding is carried out through the third host, and the normal operation of data voting is ensured.
Preferably, after the distributed cloud management center and three service nodes are deployed, the configuration environment and the software main body required by the application are packaged into a mirror image through a container virtualization technology, the mirror image is deployed on the cloud computing platform, the application container of the secure computer platform is started through the mirror image, and the mirror image can be migrated at any time and the application is started.
Preferably, each host computer preemptively takes the main and standby priority according to the power-on sequence, and when the fault and recovery occur, the main and standby priorities of the three host computers are updated according to the initial state and the identity switching strategy;
the working modes of the host comprise five working modes as follows:
1) Power-on mode: the host computer is in the power-on starting stage, after power-on, synchronous requests are sent to the other two host computers, the number of the synchronous requests received by the first powered-on host computer is the largest, and the host computer is the main working mode;
2) Master mode of operation: the host is in a normal working state, the calculation result of the host is at least consistent with the calculation result of other host, and the calculation result of the host is used as the unique output result of the whole system;
3) Standby mode of operation: the host is in a normal working state, the calculation result of the host is at least consistent with the calculation result of other host, but the host does not output the calculation result externally;
4) Following mode: the host is powered on again due to a fault, if the identity policy execution is completed, the host enters a following mode, and in the following mode, the host needs to wait for the historical state information sent by the host in a main working state to complete inheritance learning of the historical data information, and then the host enters a standby working mode to operate;
5) Reset mode: the host enters a reset mode when the host fails or the voting result is inconsistent with the other two.
6. The cloud computing-based 3-out-of-2 secure computer platform of claim 5, wherein in a power-on mode, a synchronization decision logic truth table followed by a host is shown in table 2:
TABLE 2
Number of times of receiving synchronization request Number of times of receiving the synchronization signal Synchronization results
2 0 The synchronization is successful, and the host is the first power-on host
1 1 The synchronization is successful, and the host is a second power-on host
0 1 The synchronization is successful, and the host is a third power-on host
0 0 Synchronization failure
Preferably, after the host is started, the host enters a power-on mode, and in the power-on mode, each host of the 2-out-of-3 safety redundant system firstly performs initial power-on synchronization, each host sends a synchronization request to the other two hosts when being started, each host counts the number of the synchronization requests received by the host, and the host with the largest number of the synchronization requests is the host in a main mode according to the number of the synchronization requests;
the master mode host sends a synchronizing signal to the other two computers, a task period is started, and each host performs one-time general task synchronization in each task period;
the fault recovery hosts are started through an initial power-on synchronization to determine the initial identity of each host.
Preferably, when each host computer exchanges data with other two host computers, the input and output data and intermediate state information are voted, and the voting mode comprises bitwise voting, selection voting and median voting:
the median vote is that the input data of each host computer is inconsistent, and the output data of each host computer is consistent; the selection voting is that the data to be compared in each host computer are not identical, and each host computer outputs consistent data in the three-host computer intersection; the bitwise voting is that after the bitwise comparison is carried out on the two host data for data exchange, the two host data are consistent.
Preferably, the platform performs fault self-diagnosis by adopting a health checking mechanism, performs periodic state checking on the running state of the application in the platform in a TCP, exec or HTTP mode, initiates a link request by TCP and HTTP, checks the normal opening of an application IP address plus a port, executes a custom diagnosis script by exec, monitors and triggers self-starting recovery on the application state, and restarts recovery when the state is abnormal.
Preferably, after the fault host is maintained and powered on again, a state following mechanism is adopted to acquire state following data from a normally operated host in a socket mode, and data recovery and inheritance are carried out according to the state following data;
the state following data includes:
1) Time stamp of master mode host at time of sending history information, period number information;
2) Inputting application data;
3) Communication link management table related information;
4) Intermediate state data is applied.
According to the technical scheme provided by the embodiment of the invention, the application and the running environment are packaged in a container, so that the device is light in weight and easy to migrate and deploy; the distributed cloud management center realizes real-time monitoring and resource scheduling of the lower physical service nodes, platform self-diagnosis, immediate recovery of faults and inheritance of historical variables and state data; the platform can develop peripheral applications besides providing the basic functions of a 3-out-of-2 secure computer.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of a 3-get-2 secure computer platform architecture based on cloud computing according to an embodiment of the present invention;
fig. 2 is a flow chart of identity switching triggered when a certain machine fails, which is a 3-out-of-2 secure computer provided in an embodiment of the present invention.
Fig. 3 is a flowchart of a synchronization module execution provided in an embodiment of the present invention, including initial power-on synchronization and general task synchronization.
FIG. 4 is a flow chart of a voting module according to an embodiment of the present invention, including data exchange and synchronous voting and output.
FIG. 5 is a flowchart of a 3-out-of-2 secure computer software application package start-up procedure according to an embodiment of the present invention, including three steps of mirroring by using a Docker containerization technique, computing storage network resource allocation, and container start-up.
Fig. 6 is a health check and state following execution flow designed for cloud computing characteristics according to an embodiment of the present invention, including two fault conditions, i.e., a virtual host fault and a service node fault. The overlay network can provide a unique virtual sub-network for each physical node and a routing function for the virtual host, and if one physical node fails, the overlay network can enable the virtual host IP on the failed node to be invariably migrated to the normal physical node by maintaining and updating the routing table.
Fig. 7 is a state following mechanism execution flow provided in an embodiment of the present invention, which includes three steps of following request, identity switching, and data inheritance.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the present invention and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the purpose of facilitating an understanding of the embodiments of the invention, reference will now be made to the drawings of several specific embodiments illustrated in the drawings and in no way should be taken to limit the embodiments of the invention.
Along with the development of information technology, cloud computing is taken as an innovative service mode of the current information technology, and has become a key information infrastructure for supporting the development of various industries by virtue of the characteristics of super large scale, virtualization, high reliability, universality, high scalability, on-demand service and the like. Cloud computing has become a development trend in the current era and is also a future development direction of rail transit application.
In 2019, the development of cloud computing in the actual urban rail line, such as the spring bamboo shoots after rain, is constantly presented to the masses. Currently, cities with huge operating network scale such as Beijing, shanghai, guangzhou, shenzhen and Wuhan are deployed and the construction of urban rail clouds is promoted, and naturally, new metro construction cities such as Huand Haoyue and Taiyuan are also available.
In the 9 th 2019 month, urban rail cloud projects of the first multi-line multi-service system are built, a production center cloud platform, a disaster recovery center cloud platform and a station Duan Yun platform are built from top-level design, iaaS (infrastructure and service) service is provided for multiple systems, and the construction requirements of the line 1 and the line 2 of the He Haote rail traffic are met.
And in 2019, 5 months and 20 days, zheng state uses the technology of fusion cloud, 5G and Internet of things as technical support of intelligent subways, opens and operates an ANCC cloud platform based on a wire mesh level for the first time nationwide, and deeply fuses a clearing center and a line center.
As can be seen from the above examples, the cloud computing has been used as a further development direction of the track traffic field, so the present invention will migrate a secure computer platform, which is one of core components of the track traffic, to the cloud through the cloud computing technology, and adaptively improve the secure computer platform with respect to the characteristics of the cloud computing technology.
The schematic diagram of a 3-out-of-2 secure computer platform architecture based on cloud computing provided by the embodiment of the invention is shown in fig. 1. The security computer platform is composed of a distributed cloud management center, service nodes, a security computer virtualization container and a physical infrastructure, and is a layered architecture, wherein the cloud management center, the service nodes, the security computer virtualization container and the physical infrastructure are sequentially arranged from top to bottom. The cloud pipe center is one, and the service node is a host. The number of the hosts, the number of the secure computer virtualization containers and the number of the physical infrastructures are three, the cloud management center is respectively in signaling and data communication with the three hosts (a first host, a second host and a third host), the hosts are in one-to-one correspondence with the secure computer virtualization containers, and the secure computer virtualization containers are in one-to-one correspondence with the physical infrastructures. The hosts are in data communication with corresponding secure computer virtualization containers, which are in data communication with corresponding physical infrastructure.
The software design of the 3-out-of-2 safety computer platform of the embodiment of the invention provides a software working platform for a safety demanding system to finish communication, application calculation, fault tolerance and safety functions. The three corresponding service nodes, the secure computer virtualization container and the physical base form a parallel three-host structure. And the three main machine structures independently operate to avoid common mode faults. The three host structures achieve loose coupling redundant structures based on task-level synchronization, and data exchange is carried out through a virtual network technology. The voting mechanism of taking 2 from 3 is adopted among the three host structures, so that the safety, usability and maintainability of the platform are ensured. Only the host computer in the main mode can send information to other external devices, so that the uniqueness of output is ensured.
The cloud management center is of a distributed structure, can be used for geographically disaster recovery and defending single-point faults, and does not interrupt monitoring of service nodes and user application processes. And after the communication link between any two hosts is interrupted, data forwarding can be performed through a third host, so that normal data voting is ensured.
The software design of the 3-out-of-2 secure computer platform in the embodiment of the invention can be scheduled by the cloud platform after being packaged by the dock container technology and deployed on the cloud computing platform. In order to adapt to the characteristics of cloud computing virtualization (network, storage and computing resource virtualization), high reliability (data multi-copy fault tolerance, isomorphic service node and task handover) and scalability (cluster scale dynamic scaling), the embodiment of the invention designs health check and state following mechanisms. The health check is used for detecting and guaranteeing the application life cycle of the safe computer platform, so that the safe computer platform can be restarted by itself after abnormal interruption. However, after restarting, the application data of the platform is destroyed along with the fault restarting, so the embodiment of the invention designs a state following mechanism, and the host computer of the fault restarting inherits the historical application data to the host computer which normally works, thereby ensuring that the host computer of the fault restarting can be immediately on line and resume providing service. For the 3-out-of-2 secure computer platform in the embodiment of the invention, besides the advantages that any one of the three hosts has faults, the platform can still vote and operate normally, and after the faults occur, the fault host of the platform can be rapidly on line and restore the integrity of the platform, so that the operation and maintenance work is reduced.
The embodiment of the invention designs a 3-out-of-2 secure computer platform on a PaaS cloud platform, namely Kubernetes. After the distributed cloud management node (cloud management center) and three service nodes are deployed, the configuration environment and the software main body required by the application can be packaged into a mirror image through Container virtualization technologies such as Docker, LXC (Linux content), and the like, and then the mirror image is deployed on a cloud computing platform. The image can migrate at any time and quickly launch the application. The specific implementation scheme is as follows: the cloud computing platform is built- (3) and 2 security computer software is designed in a three-module mode, the security computer platform software is packaged into a mirror image by the Docker container technology, the security computer platform application container is started through the mirror image, and the health examination and the state follow-up real-time platform monitoring are carried out.
On a hardware architecture, the bottom hardware support (constructing a cloud computing platform) of the secure computer platform based on the cloud computing technology only needs at least four physical servers (a cloud management node and three service nodes) and at most six physical service nodes (three management nodes and three service nodes), wherein the management node clusters form a cloud management center. The physical configuration of the management node and the service node is shown in table 1:
table 1 physical configuration of management node and service node
System and method for controlling a system CentOS7 x64
CPU >2 cores
Memory >2G
Storage of >20Gib
And each host computer preemptively occupies the main and standby priorities according to the power-on sequence (starting sequence), and when the host computers fail and recover, the main and standby priorities of the three host computers are updated according to the initial state and the identity switching strategy. According to the thought of the safety core, the periodic control mode of the platform software is divided into a plurality of microcycles, the state of a communication link is self-diagnosed when each microcycle is finished, and the state is reported to a cloud management center through a log system, so that the fault safety response time is reduced.
Before designing a software module, in order to distinguish a normal working host from a host after fault recovery and execute an identity switching strategy, five working modes are formulated in the embodiment of the invention, as follows:
1) Power-on mode: the host computer is in the power-on starting stage, and immediately sends synchronous requests to the other two host computers after power-on according to the preemption principle as shown in table 2, wherein the host computer which is powered on first receives the largest synchronous requests, namely the host computer is in a main working mode;
2) Master mode of operation: the host computer is in a normal working state, the calculation result of the host computer is at least consistent with the calculation result of other host computers, and the calculation result of the host computer is used as the unique output result of the whole system;
3) Standby mode of operation: the host computer is in a normal working state, the calculation result of the host computer is at least consistent with the calculation result of other host computers, but the host computer does not output the calculation result externally;
4) Following mode: and 3, a host in the 2-out secure computer platform is powered on again due to a fault, and enters a following mode if the identity policy execution is completed. In the following mode, the host computer needs to wait for the historical state information sent by the host computer in the main working state to complete the inheritance learning of the historical data information, and can enter the standby working mode to operate.
5) Reset mode: and 3, when one host in the 2-out-of-3 safety redundant system fails or the voting result is inconsistent with the other two hosts, the host enters a reset mode.
Based on the five working modes, the identity switching flow triggered when a certain machine fails in the 3-out-of-2 secure computer provided by the embodiment of the invention is shown in fig. 2.
The host in the main operation mode and the standby operation mode can provide normal application processing functions, and the host in the other operation modes cannot provide normal application processing functions.
TABLE 2 host Power-on synchronization judgment logic truth table
Number of times of receiving synchronization request Number of times of receiving the synchronization signal Synchronization results
2 0 The synchronization is successful, and the host is the first power-on host
1 1 The synchronization is successful, and the host is a second power-on host
0 1 The synchronization is successful, and the host is a third power-on host
0 0 Synchronization failure
The software module is designed to keep the integrity of the three functional modules, namely the data communication module, the synchronization module and the two-out-of-three voting module of the safe computer platform.
And the data communication module is different from Ethernet communication, and adopts an overlay network technology, namely a layer of virtualized network is overlapped on a physical network architecture, namely an overlay network. Through the overlay technology, a new virtual subnet can be added on the basis of the service node subnet, for example, a virtual subnet of 10.244.159.0/36 can be further arranged on a physical subnet of 192.168.1.0/36, and the independence and isolation of the software network environment of the safe computer platform are realized. Under the overlay network, a socket communication protocol is still used for communication interaction between hosts, and the communication delay is about 0.14 ms. In the invention, the overlay network plays a role of a virtual switch and a virtual router, wherein the virtual switch refers to that the overlay network distributes a unique virtual sub-network in a platform for each physical service node, and the virtual router refers to that the overlay network on each service node commonly maintains a routing table so that the virtual networks on each service node can access each other.
Task level synchronization includes initial power-up synchronization and general task synchronization. After the host is started, the host enters a power-on mode, and in the power-on mode, three hosts of the 2-out safety redundant system firstly perform total task synchronization, namely initial power-on synchronization, and the host can continue to run downwards on the premise of completing the initial power-on synchronization. General task synchronization is performed once in each task period for synchronization correction and for eliminating accumulated software clock synchronization errors.
Fig. 3 is a flowchart of an execution of a synchronization module according to an embodiment of the present invention. The synchronization module includes initial power-up synchronization and general task synchronization. When the primary starting host and the fault recovery host are started, initial power-on synchronization is performed once to determine the initial identity of each host, and the sent synchronization information comprises a synchronization request and a synchronization pulse signal. When each host is started, synchronous requests are sent to the other two hosts, each host counts the number of synchronous requests received by the host, the identity of the host is switched according to the number of synchronous requests, and the host with the largest synchronous request is the host with the main mode and is responsible for outputting. At this time, the master mode host transmits a synchronization signal to the other two computers, and starts a task cycle. Meanwhile, in order to distinguish the power-on restarting of the fault host, the synchronous signal frame sent by the main host also comprises the identity information of the three hosts. The synchronization mode is loose synchronization and is software synchronization, but different from common software synchronization, the platform of the embodiment of the invention can correct synchronization time (common task synchronization) again based on the main host after one synchronization period is finished, namely, remove clock errors accumulated for synchronization.
FIG. 4 is a flow chart of a voting module according to an embodiment of the present invention, including data exchange and synchronous voting and output. And the three-out-of-two voting module adopts three data comparison algorithms. Different from the hardware voting mode of the traditional safety computer, the 3-out-of-2 safety computer platform based on cloud computing uses pure software voting, the voting module software and hardware are decoupled, and each host computer needs to exchange data with other two computers, so that the input and output data and other necessary intermediate state information are voted. The data voting process selects two identical data from three data in total from the data of the local machine and the data of the other two machines according to the principle of pairwise comparison, and the two identical data are used as the output of the whole system.
The voting modes of the embodiment of the invention comprise bit-by-bit voting, selection voting and median voting, and various voting modes are respectively described below:
1) The input data is not consistent but the output data needs to be guaranteed to be consistent (median comparison).
Regarding time stamps and random numbers, considering clock drift and randomness of a processor, consistency of data generated by each machine cannot be guaranteed, so that the data is classified into data of class (1), and although clock drift exists, influence caused by drift is tolerable in one period, and a median solving method is adopted for processing, namely D= (D) 1 +D 2 +D 3 ) 3, thus ensuring the passing data ratioLater, data is obtained that is consistent for all hosts. Of course, depending on the requirements and data characteristics of the actual application, a maximum value d=max (D 1 ,D 2 ,D 3 ) Minimum value d=min (D 1 ,D 2 ,D 3 ) Or other algorithms.
2) The data to be compared among the three hosts are not identical, and the consistent data in the intersection set is required to be output (selection comparison).
Considering that the three hosts cannot be completely synchronized, a certain period is allowed to receive multi-frame data (with new and old scores, otherwise, the multi-frame data is processed as redundant data) from the same communication object. In this case, the data that needs to be provided to the upper layer application is the latest data that can be compared successfully, ensuring that the upper layer application processes the latest trusted data.
3) The dual data is required to be strictly consistent and needs to be compared bit by bit (bit by bit comparison).
The bitwise comparison is that the two parties to be compared can only output if they are completely consistent. If the data to be compared has one inconsistent bit, the comparison fails, and the data cannot be output.
And 3, after the design of the three software modules of the 2-out secure computer is finished, the corresponding software is required to be packaged into a mirror image through a dock container technology, and the package is required. After the mirror image is packaged, the 3-out-of-2 secure computer platform of the invention can construct a container by importing the packaged mirror image and carrying corresponding resources (memory, storage and network resources) as shown in fig. 5, and then starts the 3-out-of-2 virtual host.
According to the container technology, the method and the device can realize the quick online of the virtual host on any physical node.
Finally, in order to meet the fail-safe principle, namely the fail-safe principle and the adaptability improvement of the cloud computing technology, the invention designs a health checking mechanism and a state following mechanism, can recover to a safe state after the host fails, and inherits the variable and state data of the current application. Fig. 6 is a health check and state following execution flow designed for cloud computing characteristics according to an embodiment of the present invention, including two fault conditions, i.e., a virtual host fault and a service node fault. The overlay network can provide a unique virtual sub-network for each physical node and a routing function for the virtual host, and if one physical node fails, the overlay network can enable the virtual host IP on the failed node to be invariably migrated to the normal physical node by maintaining and updating the routing table.
The health checking mechanism is a self-diagnosis mode, and is used for periodically checking the running state of the application in the platform in a TCP, exec, HTTP mode, initiating a link request through TCP and HTTP, and checking the normal opening of the application IP address and the port. And (3) monitoring the application state and triggering the self-starting recovery through exec executable custom diagnosis script, and restarting the recovery when the state is abnormal. Under the mechanism, the number of the running hosts is kept to be three all the time.
Fig. 7 is a state following mechanism execution flow provided in an embodiment of the present invention, which includes three steps of following request, identity switching, and data inheritance. The state following mechanism is mainly used for solving the problem of data inheritance, because the 3-out 2-secure computer platform has very frequent data updating of internal application services, if the system is in butt joint with a database, frequent interaction between the application and the database can influence the normal operation of the application, and the system is very unfavorable for the application with high precision requirement, large resource occupation and more voting data, so that when a fault host is maintained and powered on again, the identity modes of all hosts are updated according to an identity switching strategy, the fault restarting host mode is the following mode at the moment, the state following is started, the variable data and the internal application operation state of the system are acquired from the host in normal operation in a socket mode, and the data recovery and inheritance are carried out. On the premise of recording the identities of all the hosts, the main mode host collects and transmits historical application variables or state data to the fault restarting host when the current task period is finished, and then the next period is temporary, and the fault restarting host and the normal working host synchronously operate after common task synchronization. Therefore, unlike the database storing application data, the state following can solve the problem of data inheritance by only one interaction.
The state following data information contains the following contents:
1) Time stamp of master mode host at time of sending history information, period number information. After the historical information is received, the time correction work is finished first, namely the time stamp and the period number of the host are adjusted to be consistent with those of the host in the main mode. Thus, the time stamp of the host computer can be ensured to occur, the period number is kept within an allowable range, and erroneous judgment on the validity of the message caused by the time stamp and the period number is avoided.
2) Application data is entered.
3) Communication link management table related information.
4) Other necessary application intermediate state data.
In summary, the application and the operation environment of the embodiment of the invention are packaged in a container, so that the application and the operation environment are light and easy to migrate and deploy; the platform self-diagnoses, the fault is recovered immediately, and the historical variable and the state data are inherited; the distributed cloud management center can realize real-time monitoring and resource scheduling of the lower physical service nodes; geographic disaster recovery, single-point fault prevention, one-machine fault does not affect the functions of the two-out-of-three safety computer, and normal operation can be restored within about 3 seconds; the extensible platform can develop peripheral applications, such as network traffic, identity modes of each host, memory CPU occupancy rate and the like of the front-end display security computer platform, besides providing the basic functions of the 3-access 2 security computer.
Those of ordinary skill in the art will appreciate that: the drawing is a schematic diagram of one embodiment and the modules or flows in the drawing are not necessarily required to practice the invention.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, with reference to the description of method embodiments in part. The apparatus and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (3)

1. A 3 get 2 secure computer platform based on cloud computing, comprising: the cloud management center, the service nodes, the secure computer virtualization container and the physical infrastructure are layered architecture, and the cloud management center, the service nodes, the secure computer virtualization container and the physical infrastructure are sequentially arranged from top to bottom; the cloud management center is one, the service node is a host, the cloud management center respectively carries out signaling and data communication with three hosts, the hosts are in one-to-one correspondence with the secure computer virtualization containers, the secure computer virtualization containers are in one-to-one correspondence with the physical infrastructures, the hosts carry out data communication with the corresponding secure computer virtualization containers, and the secure computer virtualization containers carry out data communication with the corresponding physical infrastructures;
independent operation is carried out among the three host structures, a loose coupling redundant structure is achieved among the three host structures based on task-level synchronization, and data exchange is carried out through a virtual network technology; a voting mechanism of taking 2 from 3 is adopted among the three host structures, and only the host in the main mode can send information to other external devices;
the cloud management center is of a distributed structure, can geographically accept disaster and defend single-point faults, and does not interrupt monitoring of service nodes and user application processes; after the communication link between any two hosts is interrupted, data forwarding is carried out through a third host, so that normal data voting is ensured;
after a distributed cloud management center and three service nodes are deployed, packaging a configuration environment and a software main body required by an application into a mirror image through a container virtualization technology, deploying the mirror image on a cloud computing platform, and starting a secure computer platform application container through the mirror image, wherein the mirror image can be migrated at any time and can start the application;
each host computer preemptively takes the main and standby priority according to the power-on sequence, and when faults and recovery occur, the main and standby priorities of the three host computers are updated according to the initial state and the identity switching strategy;
the working modes of the host comprise five working modes as follows:
1) Power-on mode: the host computer is in the power-on starting stage, after power-on, synchronous requests are sent to the other two host computers, the number of the synchronous requests received by the first powered-on host computer is the largest, and the host computer is the main working mode;
2) Master mode of operation: the host is in a normal working state, the calculation result of the host is at least consistent with the calculation result of other host, and the calculation result of the host is used as the unique output result of the whole system;
3) Standby mode of operation: the host is in a normal working state, the calculation result of the host is at least consistent with the calculation result of other host, but the host does not output the calculation result externally;
4) Following mode: the host is powered on again due to a fault, if the identity policy execution is completed, the host enters a following mode, and in the following mode, the host needs to wait for the historical state information sent by the host in a main working state to complete inheritance learning of the historical data information, and then the host enters a standby working mode to operate;
5) Reset mode: when the host computer fails or the voting result is inconsistent with the other two computers, the host computer enters a reset mode;
in the power-on mode, the synchronization judgment logic truth table followed by the host is shown in the table:
synchronization judgment logic truth table
Number of times of receiving synchronization request Number of times of receiving the synchronization signal Synchronization results 2 0 The synchronization is successful, and the host is the first power-on host 1 1 The synchronization is successful, and the host is a second power-on host 0 1 The synchronization is successful, and the host is a third power-on host 0 0 Synchronization failure
After the host is started, the host enters a power-on mode, and when the power-on mode is adopted, each host of the 2-safety redundancy system is firstly subjected to initial power-on synchronization, each host sends synchronization requests to the other two hosts when being started, each host counts the number of the synchronization requests received by the host, and the host with the largest number of the received synchronization requests is the host in a main mode according to the number of the synchronization requests;
the master mode host sends a synchronizing signal to the other two computers, a task period is started, and each host performs one-time general task synchronization in each task period;
when the fault recovery host is started, the initial identity of each host is determined through primary initial power-on synchronization;
when each host computer exchanges data with other two computers, the input and output data and intermediate state information are voted, and the voting modes comprise bit-by-bit voting, selection voting and median voting:
the median vote is that the input data of each host computer is inconsistent, and the output data of each host computer is consistent; the selection voting is that the data to be compared in each host computer are not identical, and each host computer outputs consistent data in the three-host computer intersection; the bitwise voting is that after the bitwise comparison is carried out on the two host data for data exchange, the two host data are consistent.
2. The cloud computing-based 3-out-of-2 secure computer platform according to claim 1, wherein the platform performs fault self-diagnosis by adopting a health checking mechanism, performs periodic state checking on an operation state of an application in the platform by a TCP, exec or HTTP mode, initiates a link request by TCP and HTTP, checks normal opening of an application IP address+port, performs a custom diagnosis script by exec, monitors an application state and triggers self-start recovery, and restarts recovery when the state is abnormal.
3. The cloud computing-based 3-out-of-2 secure computer platform according to claim 1, wherein after a failed host is maintained and powered on again, state following data is obtained from a normally operating host by a socket mode by adopting a state following mechanism, and data recovery and inheritance are performed according to the state following data;
the state following data includes:
1) Time stamp of master mode host at time of sending history information, period number information;
2) Inputting application data;
3) Communication link management table related information;
4) Intermediate state data is applied.
CN202110355059.2A 2021-04-01 2021-04-01 Cloud computing-based 3-acquisition-2 secure computer platform Active CN113127270B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110355059.2A CN113127270B (en) 2021-04-01 2021-04-01 Cloud computing-based 3-acquisition-2 secure computer platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110355059.2A CN113127270B (en) 2021-04-01 2021-04-01 Cloud computing-based 3-acquisition-2 secure computer platform

Publications (2)

Publication Number Publication Date
CN113127270A CN113127270A (en) 2021-07-16
CN113127270B true CN113127270B (en) 2023-06-27

Family

ID=76774512

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110355059.2A Active CN113127270B (en) 2021-04-01 2021-04-01 Cloud computing-based 3-acquisition-2 secure computer platform

Country Status (1)

Country Link
CN (1) CN113127270B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115686736A (en) * 2021-07-29 2023-02-03 西门子交通技术(北京)有限公司 2 x 2OO2 safety system based on cloud platform
CN114827148B (en) * 2022-04-28 2023-01-03 北京交通大学 Cloud security computing method and device based on cloud fault-tolerant technology and storage medium
CN116156860B (en) * 2023-02-22 2024-03-08 北京航天发射技术研究所 Electromagnetic compatibility optimization method for synchronous servo controller of electrically-driven special vehicle
CN116881920B (en) * 2023-06-27 2024-03-26 北京城建智控科技股份有限公司 Safety voting system and method based on code simulator

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102833314A (en) * 2012-07-27 2012-12-19 合肥华云通信技术有限公司 Cloud public service platform
WO2017049997A1 (en) * 2015-09-25 2017-03-30 华为技术有限公司 Virtual machine monitoring method, apparatus and system based on cloud computing service
CN107247644A (en) * 2017-07-03 2017-10-13 上海航天控制技术研究所 A kind of reconstruct down method of triple redundance computer system
CN110784539A (en) * 2019-10-29 2020-02-11 深圳供电局有限公司 Data management system and method based on cloud computing
CN111541599A (en) * 2020-04-24 2020-08-14 山东山大电力技术股份有限公司 Cluster software system and method based on data bus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102833314A (en) * 2012-07-27 2012-12-19 合肥华云通信技术有限公司 Cloud public service platform
WO2017049997A1 (en) * 2015-09-25 2017-03-30 华为技术有限公司 Virtual machine monitoring method, apparatus and system based on cloud computing service
CN107247644A (en) * 2017-07-03 2017-10-13 上海航天控制技术研究所 A kind of reconstruct down method of triple redundance computer system
CN110784539A (en) * 2019-10-29 2020-02-11 深圳供电局有限公司 Data management system and method based on cloud computing
CN111541599A (en) * 2020-04-24 2020-08-14 山东山大电力技术股份有限公司 Cluster software system and method based on data bus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于私有云的安全计算机关键技术研究";任维贺;《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》;20210315(第03期);第C033-468页 *

Also Published As

Publication number Publication date
CN113127270A (en) 2021-07-16

Similar Documents

Publication Publication Date Title
CN113127270B (en) Cloud computing-based 3-acquisition-2 secure computer platform
CN102346460B (en) Transaction-based service control system and method
CN103262044B (en) The method of virtual machine failover management and back-up system thereof
Rajagopalan et al. SecondSite: disaster tolerance as a service
JP2002517819A (en) Method and apparatus for managing redundant computer-based systems for fault-tolerant computing
CN110784331B (en) Consensus process recovery method and related nodes
US8032786B2 (en) Information-processing equipment and system therefor with switching control for switchover operation
CN107508694B (en) Node management method and node equipment in cluster
CN112181660A (en) High-availability method based on server cluster
CN103346904A (en) Fault-tolerant OpenFlow multi-controller system and control method thereof
CN103220183A (en) Implement method of Hadoop high-availability system based on double-main-engine warm backup
CN107153660A (en) The fault detect processing method and its system of distributed data base system
JP5013324B2 (en) Computer apparatus and BIOS update method thereof
CN102045187B (en) Method and equipment for realizing HA (high-availability) system with checkpoints
CN105959145B (en) A kind of method and system for the concurrent management server being applicable in high availability cluster
CN105812161A (en) Controller fault backup method and system
WO2014060465A1 (en) Control system and method for supervisory control and data acquisition
Anderson et al. Local recovery for high availability in strongly consistent cloud services
KR101430570B1 (en) Distributed computing system and recovery method thereof
Moghaddam et al. Self-healing redundancy for openstack applications through fault-tolerant multi-agent task scheduling
CN110677288A (en) Edge computing system and method generally used for multi-scene deployment
Stanik et al. Failover pattern with a self-healing mechanism for high availability cloud solutions
Wirthumer VOTRICS—Fault Tolerance Realized in Software
CN109995560A (en) Cloud resource pond management system and method
Pimentel et al. A fault management protocol for TTP/C

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant