CN113050407B - Method for determining and switching master controller and slave controller of distributed processing system - Google Patents

Method for determining and switching master controller and slave controller of distributed processing system Download PDF

Info

Publication number
CN113050407B
CN113050407B CN202110243660.2A CN202110243660A CN113050407B CN 113050407 B CN113050407 B CN 113050407B CN 202110243660 A CN202110243660 A CN 202110243660A CN 113050407 B CN113050407 B CN 113050407B
Authority
CN
China
Prior art keywords
controller
main controller
distributed processing
controllers
processing system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110243660.2A
Other languages
Chinese (zh)
Other versions
CN113050407A (en
Inventor
李成文
杨军祥
湛文韬
何立军
秦琪
丰生磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Aeronautics Computing Technique Research Institute of AVIC
Original Assignee
Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Aeronautics Computing Technique Research Institute of AVIC filed Critical Xian Aeronautics Computing Technique Research Institute of AVIC
Priority to CN202110243660.2A priority Critical patent/CN113050407B/en
Publication of CN113050407A publication Critical patent/CN113050407A/en
Application granted granted Critical
Publication of CN113050407B publication Critical patent/CN113050407B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B9/00Safety arrangements
    • G05B9/02Safety arrangements electric
    • G05B9/03Safety arrangements electric with multiple-channel loop, i.e. redundant control systems

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention discloses a method for determining and switching a main controller and a slave controller of a distributed processing system, which comprises the following steps: designing two system controllers in a distributed processing system, wherein the two system controllers work in a 1+1 hot backup mode, one is a main controller, and the other is a controller; configuring initial states of two system controllers, and setting up a default main controller and an occupation mark of the main controller; setting a control signal for controlling the switching of the main controller/the auxiliary controller, and configuring a switching strategy of the main controller/the auxiliary controller corresponding to different control signal states; and the two system controllers are controlled by the control signal and the switching strategy in the normal operation process in the 1+1 hot backup mode. In the scheme, the two controllers have fault-tolerant capability, and the system reliability is improved. The primary and standby identities of the two controllers are simply determined, and the switching mode is efficient.

Description

Method for determining and switching master controller and slave controller of distributed processing system
Technical Field
The invention belongs to the technical field of embedded computer system design.
Background
With the increasing complexity of embedded systems, the functional performance of the processing system is required to be improved, and the distributed processing system becomes a multifunctional and multitasking complex computer system.
In the prior art, the operation resources and the fault switching are generally realized by software discrimination, the fault discrimination and decision process is relatively complex, the consumed time is long, and the method has low efficiency and low real-time performance.
Disclosure of Invention
The invention provides a method for determining and switching a main controller and a slave controller of a distributed processing system, which is used for solving the problems of complex switching mode and low efficiency in the prior art.
In order to realize the task, the invention adopts the following technical scheme:
a distributed processing system host controller determining and switching method comprises the following steps:
two system controllers are designed in a distributed processing system, work in a 1+1 hot backup mode, one is a main controller, and the other is a controller;
configuring initial states of two system controllers, and setting up a default main controller and an occupation mark of the main controller;
setting a control signal for controlling the switching of the main controller/the auxiliary controller, and configuring a switching strategy of the main controller/the auxiliary controller corresponding to different control signal states; and the two system controllers are controlled by the control signal and the switching strategy in the normal operation process in the 1+1 hot backup mode.
Further, the configuring the initial states of the two system controllers and setting the default master controller and the occupation flag of the master controller includes:
when the distributed processing system is started, firstly initializing system hardware, and setting the initial states of two system controllers to be standby working modes;
the system carries out power-on test detection and records the detection result; the system reads the slot identification number of each system controller, the system controller with the slot identification number of 1 is set as a default main controller, and the system controller with the slot identification number of 2 is set as a controller; the occupancy flag set by the main controller is set to be valid.
Further, the control signal for controlling the switching of the main controller/the slave controller is a system fault signal and a network communication fault signal, wherein the system fault signal and the network communication fault signal respectively have different signal states.
Further, the signal states of the system fault signal and the network communication fault signal are valid and invalid.
Further, when the system controller with the slot identification number 1 is used as a main controller and the state of system fault signals or network communication faults is valid, the system controller with the slot identification number 2 is switched to be used as the main controller, and the occupation mark of the system controller with the slot identification number 2 is valid;
when the states of network communication fault signals of the two system controllers are effective, the distributed processing system enters an emergency working state, and only the two system controllers work;
when the states of the system fault signals of the two system controllers are valid, the distributed processing system enters a failure state.
Further, the conditions that trigger the active state of the system fault signal are: a watchdog overtime fault or a system management layer software fault in the distributed processing system; the rest time is in an invalid state;
the conditions that trigger the active state of the network communication failure are: a network communication failure of the distributed processing system.
Further, the method is stored in the form of a computer program in the memory of a computer; the computer comprises a processor and a memory, and the steps of the method for determining and switching the master controller and the slave controller of the distributed processing system are realized when the processor executes the computer program.
Further, the method is stored in a computer-readable storage medium in the form of a computer program; and when the computer program is executed by the processor, the steps of the method for determining and switching the main controller and the standby controller of the distributed processing system are realized.
Compared with the prior art, the invention has the following technical characteristics:
in the scheme, the main controller and the controller adopt a 1+1 hot backup mode, after normal starting, system management reads a controller slot identification number, a controller of a slot 1 is defaulted to operate as a main controller, a controller unit of a slot 2 is defaulted to operate as a controller, a master controller occupancy mark is effective, and switching is performed after the main controller breaks down. The main controller has fault tolerance capability and controls and manages various software and hardware resources of the distributed processing system to work cooperatively. The fast switching and task recovery can be realized through the double controllers and by utilizing system fault signals and network communication fault signals; the two controllers have fault-tolerant capability, so that the reliability of the system is improved; the primary and standby identities of the two controllers are simply determined, and the switching mode is efficient.
Drawings
FIG. 1 is a schematic diagram of the system structure of the method of the present invention.
The specific implementation mode is as follows:
various software and hardware resources and system tasks in the distributed processing system need a controller with a main and standby fault-tolerant mechanism to carry out cooperative work under unified control and management. The primary and secondary controllers are initially determined and switched when a fault occurs in operation, are key functions of the distributed processing system and are related to the reliability of system application task operation. The method has important significance for fast and efficient recovery of system work of a strong real-time system; in the scheme, quick switching and task recovery can be realized through the double controllers and by utilizing system fault signals and network communication fault signals.
The invention discloses a method for determining and switching a main controller and a controller of a distributed processing system, which comprises the following steps:
designing two system controllers in a distributed processing system, wherein the two system controllers work in a 1+1 hot backup mode, one is a main controller, and the other is a controller; configuring initial states of two system controllers, and setting up a default main controller and an occupation mark of the main controller; setting a control signal for controlling the switching of the main controller/the auxiliary controller, and configuring a switching strategy of the main controller/the auxiliary controller corresponding to different control signal states; and in the normal operation process of the two system controllers in the 1+1 hot backup mode, the two system controllers are controlled by the control signal and the switching strategy.
Wherein, the initial state of the two system controllers is configured, and the acquiescent main controller and the occupation mark of the main controller are set, including:
when the distributed processing system is started, firstly initializing system hardware, and setting the initial states of two system controllers to be standby working modes; the system carries out power-on test detection and records the detection result; the system reads the slot identification number of each system controller, the system controller with the slot identification number of 1 is set as a default main controller, and the system controller with the slot identification number of 2 is set as a controller; the occupancy flag set by the main controller is set to be valid.
The control signal for controlling the switching of the main controller/the slave controller is a system fault signal and a network communication fault signal, wherein the system fault signal and the network communication fault signal have different signal states respectively. The signal states of the system fault signal and the network communication fault signal are valid and invalid.
When the system controller with the slot identification number 1 is used as a main controller and has a system fault signal or a network communication fault state, the system controller with the slot identification number 2 is switched to be used as the main controller, and the occupation mark of the system controller with the slot identification number 2 is effective; when the states of network communication fault signals of the two system controllers are effective, the distributed processing system enters an emergency working state, and only the two system controllers work; when the states of the system fault signals of the two system controllers are valid, the distributed processing system enters a failure state.
The conditions that trigger the active state of the system fault signal are: a watchdog overtime fault or a system management layer software fault in the distributed processing system; the rest time is in an invalid state;
the conditions that trigger the active state of the network communication failure are: a network communication failure of the distributed processing system.
The method comprises the following implementation steps:
the distributed processing system is provided with two system controllers which work in a 1+1 hot backup mode. When the processing system is started, the hardware is initialized, and the initial states of the two controllers are standby working modes. The system carries out PUBIT detection and records a detection result, the system manages and reads a controller slot identification number, a controller 1 of a slot 1 is set to be a main controller to operate, a controller 2 of a slot 2 is set to be a controller to operate, and an occupancy flag CONFLAG set by the controller 1 is valid. And in the normal operation process of the two controllers in the 1+1 hot backup mode, the two controllers are controlled by a system fault signal SYSFAIL and a network communication fault signal FCFAIL. When the controller 1 appears as a master controller that the SYSFAIL signal or FCFAIL signal is valid, the master controller switches to the controller 2, and the controller 2 sets the occupancy flag config to be valid. When the FCFAIL signals of the two controllers are effective, the network communication is failed, the processing system enters an emergency working state, and only the controllers work. When SYSFAIL signals of the two controllers are valid, the two controllers are both in failure, and the processing system enters a failure state. The conditions that trigger the syshair signal to be active are: watchdog timeout failures (passive, hardware monitoring software running away) or software set failures (active). The conditions for triggering the FCFAIL signal to be active are: a network communication failure.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equally replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application, and are intended to be included within the scope of the present application.

Claims (3)

1. A distributed processing system host controller determining and switching method is characterized by comprising the following steps:
two system controllers are designed in a distributed processing system, work in a 1+1 hot backup mode, one is a main controller, and the other is a controller;
configuring initial states of two system controllers, and setting up a default main controller and an occupation mark of the main controller;
setting a control signal for controlling the switching of the main controller/the auxiliary controller, and configuring a switching strategy of the main controller/the auxiliary controller corresponding to different control signal states; the two system controllers are controlled by the control signal and the switching strategy in the normal operation process in the 1+1 hot backup mode;
the configuration of the initial states of the two system controllers and the setting of the default master controller and the occupation mark of the master controller include:
when the distributed processing system is started, firstly initializing system hardware, and setting the initial states of two system controllers to be standby working modes;
the system carries out power-on test detection and records the detection result; the system reads the slot identification number of each system controller, the system controller with the slot identification number of 1 is set as a default main controller, and the system controller with the slot identification number of 2 is set as a controller; setting an occupancy mark of the main controller to be effective;
the control signals for controlling the switching of the main controller/the backup controller are system fault signals and network communication fault signals, wherein the system fault signals and the network communication fault signals have different signal states respectively;
the signal states of the system fault signal and the network communication fault signal are valid and invalid;
when the system controller with the slot identification number 1 is used as a main controller and has a system fault signal or a network communication fault state, the system controller with the slot identification number 2 is switched to be used as the main controller, and the occupation mark of the system controller with the slot identification number 2 is effective;
when the states of network communication fault signals of the two system controllers are effective, the distributed processing system enters an emergency working state, and only the two system controllers work;
when the states of the system fault signals of the two system controllers are valid, the distributed processing system enters a failure state;
the conditions that trigger the active state of the system fault signal are: a watchdog overtime fault or a system management layer software fault in the distributed processing system; the rest time is in an invalid state;
the conditions that trigger the active state of the network communication failure are: a network communication failure of the distributed processing system.
2. The distributed processing system host-controller determination and switchover method of claim 1 wherein the method is stored in a memory of a computer in the form of a computer program; the computer comprising a processor, a memory, the processor implementing the steps of the method according to claim 1 when executing the computer program.
3. The distributed processing system host-controller determination and switching method of claim 1, wherein the method is stored in a computer-readable storage medium in the form of a computer program; the computer program, when being executed by a processor, realizes the steps of the method as claimed in claim 1.
CN202110243660.2A 2021-03-04 2021-03-04 Method for determining and switching master controller and slave controller of distributed processing system Active CN113050407B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110243660.2A CN113050407B (en) 2021-03-04 2021-03-04 Method for determining and switching master controller and slave controller of distributed processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110243660.2A CN113050407B (en) 2021-03-04 2021-03-04 Method for determining and switching master controller and slave controller of distributed processing system

Publications (2)

Publication Number Publication Date
CN113050407A CN113050407A (en) 2021-06-29
CN113050407B true CN113050407B (en) 2022-11-22

Family

ID=76510225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110243660.2A Active CN113050407B (en) 2021-03-04 2021-03-04 Method for determining and switching master controller and slave controller of distributed processing system

Country Status (1)

Country Link
CN (1) CN113050407B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114484766B (en) * 2021-12-21 2023-04-07 珠海格力电器股份有限公司 Method for determining master controller and related equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799104A (en) * 2012-07-02 2012-11-28 浙江正泰中自控制工程有限公司 Safety control redundant system and method for fully-intelligent master control system
CN106444685A (en) * 2016-12-06 2017-02-22 中国船舶重工集团公司第七〇九研究所 Distributed control system and method of distributed control system for dynamic scheduling resources
CN107733684A (en) * 2017-08-31 2018-02-23 北京宇航系统工程研究所 A kind of multi-controller computing redundancy cluster based on Loongson processor
CN108803560A (en) * 2018-05-03 2018-11-13 南京航空航天大学 Synthesization DC solid-state power controller and failure decision diagnostic method

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100393482B1 (en) * 2001-02-02 2003-08-02 두산중공업 주식회사 Hot back-up device for double excitation system
CN1968075B (en) * 2006-05-23 2010-05-12 华为技术有限公司 Distributed hot-standby logic device and primary/standby board setting method
CN101430550B (en) * 2007-03-30 2010-12-01 哈尔滨工程大学 Switch control method of engine redundancy electric-control system
CN100492223C (en) * 2007-03-30 2009-05-27 哈尔滨工程大学 Switch circuit for engine redundant electrically-controlled system
CN102541697A (en) * 2010-12-31 2012-07-04 中国航空工业集团公司第六三一研究所 Switching method for processing fault of dual-redundancy computer
CN110677282B (en) * 2019-09-23 2022-05-17 天津津航计算技术研究所 Hot backup method of distributed system and distributed system
CN112130448B (en) * 2020-09-25 2024-06-21 北京交大思诺科技股份有限公司 Dual-host-standby switching method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799104A (en) * 2012-07-02 2012-11-28 浙江正泰中自控制工程有限公司 Safety control redundant system and method for fully-intelligent master control system
CN106444685A (en) * 2016-12-06 2017-02-22 中国船舶重工集团公司第七〇九研究所 Distributed control system and method of distributed control system for dynamic scheduling resources
CN107733684A (en) * 2017-08-31 2018-02-23 北京宇航系统工程研究所 A kind of multi-controller computing redundancy cluster based on Loongson processor
CN108803560A (en) * 2018-05-03 2018-11-13 南京航空航天大学 Synthesization DC solid-state power controller and failure decision diagnostic method

Also Published As

Publication number Publication date
CN113050407A (en) 2021-06-29

Similar Documents

Publication Publication Date Title
US7334027B2 (en) Controlling method, computer system, and processing program of booting up a computer
US20180048522A1 (en) Instance node management method and management device
US8032786B2 (en) Information-processing equipment and system therefor with switching control for switchover operation
CN101635646A (en) Method and system for switching main/standby board cards
CN103853622A (en) Control method of dual redundancies capable of being backed up mutually
CN110427283B (en) Dual-redundancy fuel management computer system
CN109308242B (en) Dynamic monitoring method, device, equipment and storage medium
CN113050407B (en) Method for determining and switching master controller and slave controller of distributed processing system
US20040177242A1 (en) Dynamic computer system reset architecture
CN102508746A (en) Management method for triple configurable fault-tolerant computer system
CN116881053B (en) Data processing method, exchange board, data processing system and data processing device
CN114337944B (en) System-level main/standby redundancy general control method
CN113742165B (en) Dual master control equipment and master-slave control method
CN101557307B (en) Dispatch automation system application state management method
CN110764829B (en) Multi-path server CPU isolation method and system
CN103297279A (en) Switching method of main and backup single disks of software control in multi-software process system
CN101770211B (en) Vehicle integrated data processing method capable of realizing real-time failure switching
EP3471339B1 (en) Method and enabling device for starting physical device
CN100395962C (en) Method and system for equipment switching in communication system
CN111917588A (en) Edge device management method, device, edge gateway device and storage medium
JP5285044B2 (en) Cluster system recovery method, server, and program
CN109358982B (en) Hard disk self-healing device and method and hard disk
CN116340058A (en) Master-slave switching method and device
CN110677288A (en) Edge computing system and method generally used for multi-scene deployment
KR101447024B1 (en) Error restoration method of distributed multi-layer system for weapon based on service-scale

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant