CN113050407A

CN113050407A - Method for determining and switching master controller and slave controller of distributed processing system

Info

Publication number: CN113050407A
Application number: CN202110243660.2A
Authority: CN
Inventors: 李成文; 杨军祥; 湛文韬; 何立军; 秦琪; 丰生磊
Original assignee: Xian Aeronautics Computing Technique Research Institute of AVIC
Current assignee: Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date: 2021-03-04
Filing date: 2021-03-04
Publication date: 2021-06-29
Anticipated expiration: 2041-03-04
Also published as: CN113050407B

Abstract

The invention discloses a method for determining and switching a distributed processing system master controller and a distributed processing system slave controller, which comprises the following steps: designing two system controllers in a distributed processing system, wherein the two system controllers work in a 1+1 hot backup mode, one system controller is a main controller, and the other system controller is a controller; configuring initial states of two system controllers, and setting up a default main controller and an occupation mark of the main controller; setting a control signal for controlling the switching of the main controller/the auxiliary controller, and configuring a switching strategy of the main controller/the auxiliary controller corresponding to different control signal states; and the two system controllers are controlled by the control signal and the switching strategy in the normal operation process in a 1+1 hot backup mode. In the scheme, the two controllers have fault-tolerant capability, and the system reliability is improved. The primary and standby identities of the two controllers are simply determined, and the switching mode is efficient.

Description

Method for determining and switching master controller and slave controller of distributed processing system

Technical Field

The invention belongs to the technical field of embedded computer system design.

Background

With the increasing complexity of embedded systems, the functional performance of the processing system is required to be improved, and the distributed processing system becomes a multifunctional and multitasking complex computer system.

In the prior art, the operation resources and the fault switching are generally realized by software discrimination, the fault discrimination and decision process is relatively complex, the consumed time is long, and the method has low efficiency and low real-time performance.

Disclosure of Invention

The invention provides a method for determining and switching a main controller and a slave controller of a distributed processing system, which is used for solving the problems of complex switching mode and low efficiency in the prior art.

In order to realize the task, the invention adopts the following technical scheme:

a distributed processing system host controller determining and switching method comprises the following steps:

designing two system controllers in a distributed processing system, wherein the two system controllers work in a 1+1 hot backup mode, one system controller is a main controller, and the other system controller is a controller;

configuring initial states of two system controllers, and setting up a default main controller and an occupation mark of the main controller;

setting a control signal for controlling the switching of the main controller/the auxiliary controller, and configuring a switching strategy of the main controller/the auxiliary controller corresponding to different control signal states; and the two system controllers are controlled by the control signal and the switching strategy in the normal operation process in a 1+1 hot backup mode.

Further, the configuring the initial states of the two system controllers and setting the default master controller and the occupation flag of the master controller includes:

when the distributed processing system is started, firstly initializing system hardware, and setting the initial states of two system controllers to be standby working modes;

the system carries out power-on test detection and records the detection result; the system reads the slot identification number of each system controller, the system controller with the slot identification number of 1 is set as a default main controller, and the system controller with the slot identification number of 2 is set as a controller; the occupancy flag set by the main controller is set to be valid.

Further, the control signal for controlling the switching of the main controller/the slave controller is a system fault signal and a network communication fault signal, wherein the system fault signal and the network communication fault signal respectively have different signal states.

Further, the signal states of the system fault signal and the network communication fault signal are valid and invalid.

Further, when the system controller with the slot identification number 1 is used as a main controller and the state of system fault signals or network communication faults is valid, the system controller with the slot identification number 2 is switched to be used as the main controller, and the occupation mark of the system controller with the slot identification number 2 is valid;

when the states of network communication fault signals of the two system controllers are effective, the distributed processing system enters an emergency working state, and only the two system controllers work;

when the states of the system fault signals of the two system controllers are valid, the distributed processing system enters a failure state.

Further, the conditions that trigger the active state of the system fault signal are: a watchdog overtime fault or a system management layer software fault in the distributed processing system; the rest time is in an invalid state;

the conditions that trigger the active state of the network communication failure are: a network communication failure of the distributed processing system.

Further, the method is stored in the form of a computer program in the memory of a computer; the computer comprises a processor and a memory, and the steps of the method for determining and switching the master controller and the slave controller of the distributed processing system are realized when the processor executes the computer program.

Further, the method is stored in a computer-readable storage medium in the form of a computer program; and when the computer program is executed by the processor, the steps of the method for determining and switching the main controller and the slave controller of the distributed processing system are realized.

Compared with the prior art, the invention has the following technical characteristics:

in the scheme, the main controller and the controller adopt a 1+1 hot backup mode, after the normal startup, the system management reads the identification number of the controller slot, the controller of the slot 1 is defaulted to operate as the main controller, the controller unit of the slot 2 is defaulted to operate as the controller, the occupation mark of the main controller is effective, and the main controller is switched after the fault occurs. The main controller has fault tolerance capability and controls and manages various software and hardware resources of the distributed processing system to work cooperatively. The fast switching and task recovery can be realized through the double controllers and by utilizing system fault signals and network communication fault signals; the two controllers have fault-tolerant capability, and the reliability of the system is improved; the primary and standby identities of the two controllers are simply determined, and the switching mode is efficient.

Drawings

FIG. 1 is a schematic diagram of the system structure of the method of the present invention.

The specific implementation mode is as follows:

various software and hardware resources and system tasks in the distributed processing system need a controller with a main and standby fault-tolerant mechanism to carry out cooperative work under unified control and management. The primary and secondary controllers are initially determined and switched when a fault occurs in operation, are key functions of the distributed processing system and are related to the reliability of system application task operation. The method has important significance for fast and efficient recovery of system work of a strong real-time system; in the scheme, quick switching and task recovery can be realized through the double controllers and by utilizing system fault signals and network communication fault signals.

The invention discloses a method for determining and switching a distributed processing system master controller and a distributed processing system slave controller, which comprises the following steps:

designing two system controllers in a distributed processing system, wherein the two system controllers work in a 1+1 hot backup mode, one system controller is a main controller, and the other system controller is a controller; configuring initial states of two system controllers, and setting up a default main controller and an occupation mark of the main controller; setting a control signal for controlling the switching of the main controller/the auxiliary controller, and configuring a switching strategy of the main controller/the auxiliary controller corresponding to different control signal states; and the two system controllers are controlled by the control signal and the switching strategy in the normal operation process in a 1+1 hot backup mode.

Wherein, the initial state of the two system controllers is configured, and the acquiescent main controller and the occupation mark of the main controller are set, including:

when the distributed processing system is started, firstly initializing system hardware, and setting the initial states of two system controllers to be standby working modes; the system carries out power-on test detection and records the detection result; the system reads the slot identification number of each system controller, the system controller with the slot identification number of 1 is set as a default main controller, and the system controller with the slot identification number of 2 is set as a controller; the occupancy flag set by the main controller is set to be valid.

The control signals for controlling the switching of the main controller/the backup controller are system fault signals and network communication fault signals, wherein the system fault signals and the network communication fault signals have different signal states respectively. The signal states of the system fault signal and the network communication fault signal are valid and invalid.

When the system controller with the slot identification number 1 is used as a main controller and has a system fault signal or a network communication fault state, the system controller with the slot identification number 2 is switched to be used as the main controller, and the occupation mark of the system controller with the slot identification number 2 is effective; when the states of network communication fault signals of the two system controllers are effective, the distributed processing system enters an emergency working state, and only the two system controllers work; when the states of the system fault signals of the two system controllers are valid, the distributed processing system enters a failure state.

The conditions that trigger the active state of the system fault signal are: a watchdog overtime fault or a system management layer software fault in the distributed processing system; the rest time is in an invalid state;

The method comprises the following implementation steps:

the distributed processing system is provided with two system controllers which work in a 1+1 hot backup mode. When the processing system is started, the hardware is initialized, and the initial states of the two controllers are standby working modes. The system carries out PUBIT detection and records a detection result, the system manages and reads a controller slot identification number, a controller 1 of a slot 1 is set to be a main controller to operate, a controller 2 of a slot 2 is set to be a controller to operate, and an occupancy flag CONFLAG set by the controller 1 is valid. The two controllers are controlled by a system fault signal SYSFAIL and a network communication fault signal FCFAIL in the normal operation process in a 1+1 hot backup mode. When the controller 1 appears as a master controller that the SYSFAIL signal or FCFAIL signal is valid, the master controller switches to the controller 2, and the controller 2 sets the occupancy flag config to be valid. When the FCFAIL signals of the two controllers are effective, the network communication is failed, the processing system enters an emergency working state, and only the controllers work. When SYSFAIL signals of the two controllers are valid, the two controllers are both in failure, and the processing system enters a failure state. The conditions that trigger the syshair signal to be active are: watchdog timeout failures (passive, hardware monitoring software running away) or software set failures (active). The conditions for triggering the FCFAIL signal to be active are: a network communication failure.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equally replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application, and are intended to be included within the scope of the present application.

Claims

1. A distributed processing system host controller determining and switching method is characterized by comprising the following steps:

2. The distributed processing system master-slave determination and switching method according to claim 1, wherein configuring initial states of two system controllers and setting default master controllers and master controller ownership flags comprises:

3. The distributed processing system master controller and slave controller determining and switching method according to claim 1, wherein the control signals for controlling the master controller/slave controller switching are a system fault signal and a network communication fault signal, wherein the system fault signal and the network communication fault signal respectively have different signal states.

4. The distributed processing system master and slave determination and switchover method of claim 1 wherein the signal status of the system fault signal, the network communication fault signal is valid and invalid.

5. The method for determining and switching master controllers and slave controllers of a distributed processing system according to claim 1, wherein when the system controller with slot identification number 1 is valid as the master controller in a state where a system fault signal or a network communication fault occurs, the system controller with slot identification number 2 is switched as the master controller, and the occupation flag of the system controller with slot identification number 2 is valid;

6. The distributed processing system master-slave determination and switching method of claim 1, wherein:

7. The distributed processing system host-controller determination and switchover method of claim 1 wherein the method is stored in a memory of a computer in the form of a computer program; the computer comprises a processor, a memory, and a computer program which, when executed by the processor, performs the steps of the method according to any one of claims 1 to 6.

8. The distributed processing system host-controller determination and switching method of claim 1, wherein the method is stored in a computer-readable storage medium in the form of a computer program; the computer program, when being executed by a processor, is adapted to carry out the steps of the method according to any one of claims 1 to 6.