CN113127270A

CN113127270A - Cloud computing-based 2-out-of-3 safety computer platform

Info

Publication number: CN113127270A
Application number: CN202110355059.2A
Authority: CN
Inventors: 唐涛; 朱力; 李松; 王悉; 王洪伟
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2021-04-01
Filing date: 2021-04-01
Publication date: 2021-07-16
Anticipated expiration: 2041-04-01
Also published as: CN113127270B

Abstract

The invention provides a cloud computing-based 2-out-of-3 secure computer platform. The method comprises the following steps: hierarchy from top to bottom: the system comprises a cloud management center, service nodes, a secure computer virtualization container and a physical infrastructure; the cloud management center is provided with one cloud management center, the service nodes are hosts, the cloud management center is in signaling and data communication with the three hosts respectively, the hosts are in one-to-one correspondence with the safety computer virtualization containers, the safety computer virtualization containers and the physical infrastructure, the hosts are in data communication with the corresponding safety computer virtualization containers, and the safety computer virtualization containers are in data communication with the corresponding physical infrastructure. The application and operation environment of the invention is containerized, light, easy to move and deploy; the distributed cloud management center realizes real-time monitoring, resource scheduling and platform self-diagnosis of lower-layer physical service nodes, immediately recovers faults and inherits historical variable and state data; the platform provides 3 and takes 2 the basic functions of the safety computer, and can also develop peripheral application.

Description

Cloud computing-based 2-out-of-3 safety computer platform

Technical Field

The invention relates to the technical field of security computers, in particular to a 3-out-of-2 security computer platform based on cloud computing.

Background

The safety computer technology relates to the fields of rail transit, aerospace and the like. The technology is used for guaranteeing the correctness of input, output and intermediate states of equipment or application, and a multi-mode redundancy mode is mostly adopted.

In the field of rail transit, ground equipment and vehicle-mounted equipment are both composed of safety computers. When the equipment is in emergency failure due to physical reasons or other reasons, another set of system or emergency treatment scheme needs to be designed to timely record the failure state and restore the safety of the equipment, namely the principle of failure safety must be followed: the system state can be guided to safety in case of failure.

In terms of architecture design, the internal architecture of the secure computer platform generally adopts a dual-channel structure (2 is multiplied by 2 to obtain 2) or a multi-channel structure (3 is obtained by 2), and a plurality of channels monitor each other and vote respective input and output to judge the normality or abnormality of each channel. The architecture mainly comprises three modules, namely a data communication module, a synchronization module among channels and an input/output two-out-of-three voting module.

At present, the security computer platform with 2 software and hardware in the prior art has the following defects:

1) the cost of the number of the board cards or the host computers is increased due to the redundancy design concept.

The german SIEMENS is based on the SICAS system with the two-out-of-three structure and the SelTrac system based on the two-out-of-three structure of the french company, which both include the safety computer based on the redundancy design concept, and the multi-channel redundancy design method inevitably causes the number of board cards or hosts to be increased by times, so that the whole set of safety computer equipment occupies one or more cabinets enough.

2) The board card is bound with the software, and the failure of the hardware and the software can cause the functional failure of the safety computer.

The general hardware of the three-out-of-two-use safety computer mainly comprises a plurality of modules such as a CPU processor module, a memory module, a power supply module, a peripheral circuit and the like. Physical failure of each module increases the probability of a failure of the secure computer function.

3) Maintenance replacement causes interruption of application services.

The safety computer platform hardware has a certain mean time to failure, namely the service life is limited. Once the equipment fails or the hardware ages, the time required for maintaining the update inevitably causes a part of the safety computer to fail, thereby causing application service interruption.

Disclosure of Invention

Embodiments of the present invention provide a 3 out of 2 secure computer platform based on cloud computing to overcome the problems of the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme.

A cloud computing-based 2-out-of-3 secure computer platform, comprising: the system comprises a cloud management center, service nodes, a safety computer virtualization container and a physical infrastructure, wherein the cloud management center, the service nodes, the safety computer virtualization container and the physical infrastructure are of a layered architecture and are sequentially arranged from top to bottom; the cloud management center is provided with one cloud management center, the service nodes are hosts, the cloud management center is in signaling and data communication with the three hosts respectively, the hosts are in one-to-one correspondence with the safety computer virtualization containers, the safety computer virtualization containers are in one-to-one correspondence with the physical infrastructures, the hosts are in data communication with the corresponding safety computer virtualization containers, and the safety computer virtualization containers are in data communication with the corresponding physical infrastructures.

Preferably, independent operation is carried out among the three host computer structures, a loose coupling redundant structure is achieved among the three host computer structures based on task level synchronization, and data exchange is carried out through a virtual network technology; a voting mechanism of 2 out of 3 is adopted among the three host computer structures, and only the host computer in the main mode can send information to other external equipment.

Preferably, the cloud management center is of a distributed structure, can be used for geographic disaster recovery and defending single-point faults, and does not interrupt monitoring of service nodes and user application processes; and after the communication link between any two hosts is interrupted, the data is forwarded through the third host, so that the normal operation of data voting is ensured.

Preferably, after the distributed cloud management center and the three service nodes are deployed, the configuration environment and the software main body required by the application are packaged into a mirror image through a container virtualization technology, the mirror image is deployed on the cloud computing platform, the application container of the security computer platform is started through the mirror image, and the mirror image can be migrated and started at any time.

Preferably, each host preempts the primary and secondary priorities according to the power-on sequence, and when the host fails or recovers, the primary and secondary priorities of the three hosts are updated according to the initial state and the identity switching strategy;

the working modes of the host comprise five working modes as follows:

1) a power-on mode: the host is in a power-on starting stage, and sends synchronous requests to the other two hosts after power-on, the host powered on first receives the largest number of synchronous requests, and the host is in a main working mode;

2) the main working mode is as follows: the host computer is in a normal working state, the calculation result of the host computer is at least consistent with the calculation result of one other host computer, and the calculation result of the host computer is used as the only output result of the whole system;

3) standby operation mode: the host computer is in a normal working state, the calculation result of the host computer is at least consistent with the calculation result of one other host computer, but the host computer does not output the calculation result outwards;

4) following mode: the host is powered on again due to faults and started, if the execution of the identity strategy is finished, the host enters a following mode, and under the following mode, the host needs to wait for historical state information sent by the host in a main working state, complete inheritance learning of historical data information and then enter a standby working mode to operate;

5) resetting mode: when the host is in failure or the voting result is inconsistent with the other two machines, the host enters a reset mode.

6. The cloud-computing-based 2-out-of-3 secure computer platform of claim 5, wherein in power-on mode, the synchronization decision logic truth table followed by the host is as shown in Table 2:

TABLE 2

Number of times of receiving synchronization request	Number of times of receiving synchronization signal	Synchronization result
			2	0	The synchronization is successful, and the host is the first power-on host
1	1	The synchronization is successful, and the host is a second power-on host
			0	1	The synchronization is successful, and the host is a third power-on host
0	0	Synchronization failure

。

Preferably, the host enters a power-on mode after being started, when the power-on mode is adopted, each host of the 2-out-of-3 safety redundancy system firstly carries out initial power-on synchronization once, each host sends synchronization requests to the other two hosts when being started, each host counts the number of the received synchronization requests, the identity of each host is switched according to the number of the synchronization requests, and the host receiving the most synchronization requests is the host in the main mode;

the main mode host sends a synchronization signal to the other two machines, starts a task period, and carries out one-time general task synchronization on each host in each task period;

the initial power-on synchronization is performed once when the fault recovery host is started, so as to determine the initial identity of each host.

Preferably, when each host exchanges data with the other two machines, the input and output data and the intermediate state information are voted, and the voting mode includes bit-by-bit voting, selective voting and median voting:

the median vote is that the input data of each host are inconsistent, and the output data of each host are consistent; the selection voting is that the data to be compared in each host are not completely the same, and each host outputs consistent data in the three-host intersection; the bitwise voting is to compare the two host data for data exchange bit by bit and keep the two host data consistent.

Preferably, the platform performs fault self-diagnosis by adopting a health check mechanism, performs periodic state check on the running state of the application inside the platform in a TCP, exec or HTTP mode, initiates a link request through TCP and HTTP, checks the normal opening of an application IP address + port, executes a custom diagnosis script through exec, monitors the application state and triggers self-starting recovery, and restarts recovery when the state is abnormal.

Preferably, after the faulty host is maintained and powered on again, state following data is acquired from the normally running host by a state following mechanism in a socket mode, and data recovery and inheritance are performed according to the state following data;

the state following data includes:

1) the timestamp and cycle number information of the main mode host at the moment of sending the historical information;

2) inputting application data;

3) communication link management table related information;

4) intermediate state data is applied.

According to the technical scheme provided by the embodiment of the invention, the application and running environment of the invention is containerized and packaged, and the invention is light in weight and easy to migrate and deploy; the distributed cloud management center realizes real-time monitoring, resource scheduling and platform self-diagnosis of lower-layer physical service nodes, immediately recovers faults and inherits historical variable and state data; the platform can develop peripheral application besides providing 2-out-of-3 safety computer basic functions.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of a 3-out-of-2 secure computer platform architecture based on cloud computing according to an embodiment of the present invention;

fig. 2 is an identity switching process triggered when a certain computer fails, for a 3-out-of-2 secure computer provided in an embodiment of the present invention.

Fig. 3 is a flow chart of a synchronization module according to an embodiment of the present invention, including initial power-on synchronization and general task synchronization.

Fig. 4 is a flow chart of a voting module according to an embodiment of the present invention, including data exchange, synchronous voting, and output.

Fig. 5 is a 3-out-of-2 secure computer software application package starting process according to an embodiment of the present invention, which includes three steps of packaging a mirror image by using a Docker containerization technique, allocating computing storage network resources, and starting a container.

Fig. 6 is a health check and state following execution flow designed for cloud computing characteristics according to an embodiment of the present invention, which includes two failure situations, namely a virtual host failure and a service node failure. The overlay network can provide a unique virtual subnet of the whole cluster for each physical node and provide a routing function for the virtual host, and if a certain physical node fails, the overlay network can maintain and update a routing table to enable the IP of the virtual host on the failed node to be constantly migrated to the normal physical node.

Fig. 7 is an execution flow of a state following mechanism according to an embodiment of the present invention, which includes three steps of following a request, identity switching, and data inheritance.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking several specific embodiments as examples in conjunction with the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.

With the development of information technology, cloud computing is an innovative service mode of information technology in the present time, and has become a key information infrastructure supporting the development of various industries by virtue of the characteristics of super-large scale, virtualization, high reliability, universality, high scalability, on-demand service and the like. Cloud computing has become a development trend of the current era and is a development direction of rail transit application in the future.

In 2019, the construction and development of cloud computing in actual urban rail lines are continuously shown in the front of the masses as spring shoots after rain. At present, cities with large scale operated line networks such as Beijing, Shanghai, Guangzhou, Shenzhen and Wuhan are deployed and promote the construction of urban rail clouds, and emerging subways such as Hohaote and Taiyuan are also used for constructing cities.

In 9 months of 2019, the first urban rail cloud project of a global multiline multi-service system is set up in a harmonious manner, a production center cloud platform, a disaster recovery center cloud platform and a station section cloud platform are designed from the top, IaaS (infrastructure as a service) service is provided for multiple systems, and the construction requirements of No. 1 and No. 2 lines of harmonious manner rail transit are met.

In 2019, 20 days in 5 months, Zheng state opens and operates a first ANCC cloud platform based on line network level in China by fusing cloud, 5G and Internet of things technologies and taking the cloud, 5G and Internet of things technologies as technical support of an intelligent subway, and deeply fuses a clearing center and a line center.

It can be seen from the above examples that cloud computing has been used as another development direction in the field of rail transit, so the present invention migrates the security computer platform, which is one of the core components of rail transit, to the cloud through the cloud computing technology, and performs adaptive improvement on the security computer platform according to the characteristics of the cloud computing technology.

Fig. 1 shows a schematic diagram of a 3-out-of-2 secure computer platform architecture based on cloud computing. The safety computer platform consists of a distributed cloud management center, service nodes, a safety computer virtualization container and a physical infrastructure, and is a layered architecture. One cloud management center is provided, and the service node is a host. The number of the hosts, the safety computer virtualization container and the physical infrastructure are three, the cloud management center is in signaling and data communication with the three hosts (the first host, the second host and the third host), the hosts correspond to the safety computer virtualization container one by one, and the safety computer virtualization container corresponds to the physical infrastructure one by one. The host computer is in data communication with the corresponding secure computer virtualization container, which is in data communication with the corresponding physical infrastructure.

In the embodiment of the invention, the software of the 2 safe computer platforms is designed to provide a software working platform for a safe demanding system, and the functions of communication, application calculation, fault tolerance and safety are completed. And the three groups of corresponding service nodes, the safety computer virtualization container and the physical foundation form a three-host structure which is arranged in parallel. The three host structures are independently operated to avoid common mode faults. The three host computer structures achieve a loose coupling redundant structure based on task level synchronization, and data exchange is carried out through a virtual network technology. A voting mechanism of 2 out of 3 is adopted between the three host structures, so that the safety, the usability and the maintainability of the platform are ensured. Only the host in the main mode can send information to other external devices, so that the uniqueness of output is ensured.

The cloud management center is of a distributed structure, can be used for geographic disaster recovery and defending single-point faults, and does not interrupt monitoring on service nodes and user application processes. And after the communication link between any two hosts is interrupted, the data can be forwarded through the third host, so that the normal operation of data voting is ensured.

In the embodiment of the invention, the software design of the 2-out-of-3 secure computer platform is packaged by a Docker container technology, and then is deployed on a cloud computing platform, so that the software design can be scheduled by the cloud platform. In order to adapt to the characteristics of cloud computing virtualization (network, storage and computing resource virtualization), high reliability (data multi-copy fault tolerance, isomorphic service nodes and task handover) and expandability (dynamic cluster scale expansion), the embodiment of the invention designs a health check and state following mechanism. The health check is used for detecting and guaranteeing the application life cycle of the safety computer platform, so that the safety computer platform can be automatically restarted after abnormal interruption. However, the platform application data after restarting is destroyed along with the fault restart, so the embodiment of the invention designs a state following mechanism, and the fault restart host inherits the historical application data to the normal working host, thereby ensuring that the fault restart host can be on-line immediately and restore to provide services. For the 2-out-of-3 safety computer platform in the embodiment of the invention, in addition to the advantages that the platform can still vote and run normally when any one of the three hosts fails, the platform failure host can be quickly brought online and the integrity of the platform can be recovered after the failure occurs, and the operation and maintenance work is reduced.

The embodiment of the invention designs a 2-out-of-3 safety computer platform on a PaaS cloud platform-Kubernets. After the distributed cloud management node (cloud management center) and the three service nodes are deployed, the configuration environment and the software main body required by the application can be packaged into a mirror image through the Container virtualization technology such as Docker and LXC (Linux Container), and then the mirror image is deployed on the cloud computing platform. The image can be migrated at any time and the application can be started quickly. The specific implementation mode is shown as the following flow: the cloud computing platform is built, 3, 2 safe computer software is taken, three modules are designed, Docker container technology is used for packaging the safe computer platform software into a mirror image, the safe computer platform application container is started through the mirror image, health check and state follow real-time platform monitoring are achieved.

In terms of hardware architecture, the bottom hardware support (for constructing the cloud computing platform) of the safety computer platform based on the cloud computing technology only needs at least four physical servers (one cloud management node and three service nodes) and at most six physical service nodes (three management nodes and three service nodes), wherein a management node cluster forms a cloud management center. The physical configuration of the management node and the service node is shown in table 1:

table 1 physical configuration of management nodes and service nodes

System for controlling a power supply	CentOS7 x64
		CPU	>2 nucleus
Memory device	>2G
		Storing	>20Gib

And each host preempts the primary and standby priorities according to the power-on sequence (starting sequence), and updates the primary and standby priorities of the three hosts according to the initial state and the identity switching strategy when the host fails or recovers. According to the idea of the safety core, the periodic control mode of the platform software is divided into a plurality of micro periods, the communication link state is self-diagnosed when each micro period is finished, and the communication link state is reported to the cloud pipe center through the log system, so that the failure safety response time is shortened.

Before designing a software module, in order to distinguish a normally-operating host from a host after fault recovery and execute an identity switching strategy, the embodiment of the invention sets up five operating modes as follows:

1) a power-on mode: the host is in a power-on starting stage, the host follows a preemption principle and is shown in a table 2, after being powered on, the host immediately sends synchronization requests to the other two hosts, and the host powered on first receives the largest number of the synchronization requests, namely the host working mode;

2) the main working mode is as follows: when the host computer is in a normal working state, the calculation result of the host computer is at least consistent with the calculation result of one other host computer, and the calculation result of the host computer is used as the only output result of the whole system;

3) standby operation mode: when the host computer is in a normal working state, the calculation result of the host computer is at least consistent with the calculation result of one other host computer, but the host computer does not output the calculation result outwards;

4) following mode: and 3, taking 2, electrifying and starting one host in the safety computer platform again due to faults, and entering a following mode if the execution of the identity strategy is finished. In the following mode, the host computer needs to wait for the historical state information sent by the host computer in the main working state to complete the inheritance learning of the historical data information, and then can enter the standby working mode to operate.

5) Resetting mode: and 3, taking 2, when one host in the safety redundancy system is in failure or the voting result is inconsistent with the other two hosts, the host enters a reset mode.

Based on the five operating modes, an identity switching process triggered when a certain computer fails for a 3-out-of-2 secure computer provided by the embodiment of the present invention is shown in fig. 2.

The host in the active working mode and the standby working mode can provide normal application processing functions, and the host in other working modes cannot provide normal application processing functions.

TABLE 2 host power-on synchronization judgment logic truth table

In the design of the software module, the integrity of three functional modules, namely a data communication module, a synchronization module and a two-out-of-three voting module of the safety computer platform is reserved.

The data communication module is different from ethernet communication and adopts overlay network technology, namely, a layer of virtualization network is superposed on a physical network architecture, namely, an overlay network. Through overlay technology, a new virtual subnet can be added on the basis of a service node subnet, for example, 10.244.159.0/36 virtual subnet can be set on 192.168.1.0/36 physical subnet, so that the independence and isolation of a software network environment of a secure computer platform are realized. Under the overlay network, a socket communication protocol is still used for communication interaction between hosts, and communication delay is about 0.14 ms. In the invention, the overlay network plays the roles of a virtual switch and a virtual router, wherein the virtual switch refers to that the overlay network distributes a unique virtual subnet in a platform for each physical service node, and the virtual router refers to that the overlay network on each service node maintains a routing table together, so that virtual hosts on each service node can access each other.

Task level synchronization includes initial power-up synchronization and general task synchronization. And (3) entering a power-on mode after the host is started, and when the power-on mode is adopted, performing primary general task synchronization on three hosts of the 2-out-of-2 safety redundant system, namely initial power-on synchronization, wherein the hosts can continue to run downwards on the premise of finishing the initial power-on synchronization. And the common task synchronization is performed once in each task period, so that the synchronization correction is performed, and the accumulated software clock synchronization error is cleared.

Fig. 3 is an execution flow of a synchronization module according to an embodiment of the present invention. The synchronization module includes initial power-on synchronization and general task synchronization. The initial power-on synchronization is performed once when the initial startup host and the fault recovery host are started to determine the initial identity of each host, and the sent synchronization information comprises a synchronization request and a synchronization pulse signal. Each host computer sends a synchronization request to the other two host computers when being started, each host computer counts the number of the synchronization requests received by the host computer, switches the identity of the host computer according to the number of the synchronization requests, and the host computer receiving the most synchronization requests is the main mode host computer and is responsible for outputting the synchronization requests to the outside. At this time, the main host in the main mode sends a synchronous signal to the other two machines, and starts a task cycle. Meanwhile, in order to distinguish the power-on restart of the fault host, the synchronous signal frame sent by the main host also comprises the identity information of the three hosts. The synchronization mode is loose synchronization, which is software-form synchronization, but is different from general software synchronization, and the platform of the embodiment of the invention corrects the synchronization time (general task synchronization) again by taking the main host as the standard after one synchronization period is finished, namely, the clock error accumulated by synchronization is eliminated.

Fig. 4 is a flow chart of a voting module according to an embodiment of the present invention, including data exchange, synchronous voting, and output. And the three-out-of-two voting module adopts three data comparison algorithms. Different from a hardware voting mode of a traditional safety computer, the cloud computing-based 2-out-of-3 safety computer platform uses pure software voting, voting module software and hardware are decoupled, and each host needs to exchange data with other two computers, so that input and output data and other necessary intermediate state information are voted. And in the data voting process, two identical data are selected from three data in total from the data from the local machine and the data from the other two machines according to a pairwise comparison principle and are used as the output of the whole system.

The voting mode of the embodiment of the invention comprises bitwise voting, selective voting and median voting, and various voting modes are respectively introduced as follows:

1) the input data are not consistent but it is necessary to ensure that the output data are consistent (median comparison).

For time stamps and random numbers, considering clock drift and randomness of a processor, data generated by each machine cannot be guaranteed to be consistent, so the data is classified as data (1), although clock drift exists, influence caused by the drift can be tolerated in one period, and a method for obtaining a median value is adopted for processing, namely D-D (D-D)₁+D₂+D₃) And/3, so that the data comparison is carried out to obtain the consistent data of all the hosts. Of course, the maximum value D ═ Max (D) may be used according to the requirements and data characteristics of the actual application₁,D₂，D₃) Min (D) is the minimum value D₁,D₂,D₃) Or other algorithms.

2) The data to be compared in the three hosts are not completely the same, and the consistent data in the intersection is required to be output (selective comparison).

Considering that the three hosts can not be completely synchronized, a certain period is allowed to receive multi-frame data from the same communication object (with old and new points, otherwise, the data is processed as redundant data). In this case, the data required to be provided to the upper layer application is the latest data that can be successfully provided, and it is ensured that the upper layer application processes the latest trusted data.

3) The dual-computer data are required to be strictly consistent, and bit-by-bit comparison (bit-by-bit comparison) is required.

The bit-by-bit comparison means that the two parties to be compared can only output the result if the two parties are completely consistent. If only one bit in the data to be compared is inconsistent, the comparison is returned to fail, and the data cannot be output.

And 3, after the three software modules of the 2 safe computers are designed, packaging the corresponding software into a mirror image through a Docker container technology, wherein the mirror image is required for packaging. After the mirror image encapsulation is finished, the get 3 and get 2 secure computer platform of the present invention can build a container as shown in fig. 5 by importing the encapsulated mirror image and carrying corresponding resources (memory, storage and network resources), and further start the get 3 and get 2 virtual host.

According to the container technology, the invention can realize the quick online of the virtual host on any physical node.

Finally, in order to meet the fault safety principle, namely the fault guiding safety principle and the adaptability improvement of the cloud computing technology, the invention designs a health check mechanism and a state following mechanism, which can be recovered to a safe state after the host fault occurs and inherit the currently applied variable and state data. Fig. 6 is a health check and state following execution flow designed for cloud computing characteristics according to an embodiment of the present invention, which includes two failure situations, namely a virtual host failure and a service node failure. The overlay network can provide a unique virtual subnet of the whole cluster for each physical node and provide a routing function for the virtual host, and if a certain physical node fails, the overlay network can maintain and update a routing table to enable the IP of the virtual host on the failed node to be constantly migrated to the normal physical node.

The health check mechanism is a self-diagnosis mode, the running state of the application in the platform is periodically checked in a TCP (transmission control protocol), exec and HTTP (hyper text transport protocol) mode, a link request is initiated through the TCP and the HTTP, and the normal opening of an application IP address and a port is checked. The self-defined diagnosis script can be executed through exec, the application state is monitored, self-starting recovery is triggered, and recovery is restarted when the state is abnormal. Under the mechanism, three hosts keep the number of running hosts to be three all the time.

Fig. 7 is an execution flow of a state following mechanism according to an embodiment of the present invention, which includes three steps of following a request, identity switching, and data inheritance. The state following mechanism mainly aims at solving the problem of data inheritance, because the data of the internal application service of the 2-out-of-3 secure computer platform is updated very frequently, if the data is in butt joint with a database, the normal operation of the application can be influenced by frequent interaction between the application and the database, and the application with high precision requirement, large resource occupation and much voting data is very unfavorable for the application with high precision requirement, large resource occupation and much voting data, when a fault host is maintained and electrified again, the identity modes of all hosts are updated according to an identity switching strategy, at the moment, the fault restart host mode is a following mode, then the waiting state following is started, variable data and the internal application running state are obtained from the normal operation host in a socket mode, and data recovery and inheritance are carried out. On the premise of recording all host identities, the main mode host collects and sends historical application variables or state data to the fault restart host when the current task cycle is finished, and then when the next cycle comes, the fault restart host and the normal working host synchronously run after the normal tasks are synchronized. Therefore, unlike the database storing application data, the state following only needs one interaction to solve the data inheritance problem.

The data information followed by the state comprises the following contents:

1) and sending the timestamp and cycle number information of the main mode host at the moment of sending the historical information. After receiving the history information, the time correction work is finished firstly, namely the timestamp and the cycle number of the host are adjusted to be consistent with those of the host in the main mode. Therefore, the time stamp of the host can be ensured to occur, the cycle number is kept within an allowable range, and the wrong judgment on the validity of the message due to the time stamp and the cycle number is avoided.

2) Application data is input.

3) Communication link management table related information.

4) Other necessary applications intermediate state data.

In summary, the application and operation environment of the embodiment of the invention is containerized, light, and easy to migrate and deploy; the platform self-diagnoses, the fault is immediately recovered, and the historical variable and the state data are inherited; the distributed cloud management center can realize real-time monitoring and resource scheduling of lower-layer physical service nodes; geographic disaster tolerance, single-point failure prevention, one-machine failure without affecting the function of the two-out-of-three safety computer, and normal work recovery within about 3 s; the platform can be expanded, and can develop peripheral applications such as network flow, host identity modes, memory CPU occupancy rate and the like of the front-end display security computer platform besides providing the basic functions of the 2-out-of-3 security computer.

Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A cloud computing-based 2-out-of-3 secure computer platform, comprising: the system comprises a cloud management center, service nodes, a safety computer virtualization container and a physical infrastructure, wherein the cloud management center, the service nodes, the safety computer virtualization container and the physical infrastructure are of a layered architecture and are sequentially arranged from top to bottom; the cloud management center is provided with one cloud management center, the service nodes are hosts, the cloud management center is in signaling and data communication with the three hosts respectively, the hosts are in one-to-one correspondence with the safety computer virtualization containers, the safety computer virtualization containers are in one-to-one correspondence with the physical infrastructures, the hosts are in data communication with the corresponding safety computer virtualization containers, and the safety computer virtualization containers are in data communication with the corresponding physical infrastructures.

2. The cloud computing-based secure computer platform of claim 1, wherein the three host computer structures independently operate, a loosely coupled redundant structure is achieved between the three host computer structures based on task level synchronization, and data exchange is performed through a virtual network technology; a voting mechanism of 2 out of 3 is adopted among the three host computer structures, and only the host computer in the main mode can send information to other external equipment.

3. The cloud computing-based secure computer platform of claim 1, wherein the cloud management center is a distributed structure, and is capable of geographically disaster recovery and defending against single-point failures, and monitoring service nodes and user application processes is uninterrupted; and after the communication link between any two hosts is interrupted, the data is forwarded through the third host, so that the normal operation of data voting is ensured.

4. The cloud-computing-based secure computer platform of claim 1, wherein after a distributed cloud management center and three service nodes are deployed, a configuration environment and a software main body required by an application are packaged into a mirror image through a container virtualization technology, the mirror image is deployed on the cloud computing platform, the secure computer platform application container is started through the mirror image, and the mirror image can be migrated and started at any time.

5. The cloud computing-based 2-out-of-3 secure computer platform of any one of claims 1 to 4, wherein each host preempts the primary and secondary priorities according to a power-on sequence, and updates the primary and secondary priorities of three hosts according to an initial state and an identity switching policy during failure and recovery;

the working modes of the host comprise five working modes as follows:

TABLE 2

。

7. The cloud computing-based secure computer platform of claim 5, wherein the hosts enter a power-on mode after being started, and when the host is in the power-on mode, each host of the secure redundancy system of claim 2 performs initial power-on synchronization once, each host sends synchronization requests to the other two hosts when being started, each host counts the number of the synchronization requests received by itself, switches the identity of the host according to the number of the synchronization requests, and the host receiving the most synchronization requests is the host in the primary mode;

8. The cloud computing-based 2-out-of-3 secure computer platform as recited in claim 5, wherein when each host exchanges data with the other two hosts, input and output data and intermediate status information are voted in a manner including bit-by-bit voting, selective voting and median voting:

9. The cloud computing-based 2-out-of-3 security computer platform as claimed in claim 5, wherein the platform employs a health check mechanism to perform fault self-diagnosis, performs periodic state check on the running state of applications inside the platform in a TCP, exec or HTTP manner, initiates a link request through TCP and HTTP, checks the normal opening of application IP address + port, executes a custom diagnosis script through exec, monitors the application state and triggers self-start recovery, and restarts recovery when the state is abnormal.

10. The cloud computing-based 2-out-of-3 secure computer platform as claimed in claim 5, wherein after a failed host is maintained and powered up again, a state following mechanism is used to obtain state following data from a normally running host in a socket manner, and data recovery and inheritance are performed according to the state following data;

the state following data includes:

2) inputting application data;

3) communication link management table related information;

4) intermediate state data is applied.