CN113050407A - Method for determining and switching master controller and slave controller of distributed processing system - Google Patents

Method for determining and switching master controller and slave controller of distributed processing system Download PDF

Info

Publication number
CN113050407A
CN113050407A CN202110243660.2A CN202110243660A CN113050407A CN 113050407 A CN113050407 A CN 113050407A CN 202110243660 A CN202110243660 A CN 202110243660A CN 113050407 A CN113050407 A CN 113050407A
Authority
CN
China
Prior art keywords
controller
distributed processing
controllers
processing system
switching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110243660.2A
Other languages
Chinese (zh)
Other versions
CN113050407B (en
Inventor
李成文
杨军祥
湛文韬
何立军
秦琪
丰生磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Aeronautics Computing Technique Research Institute of AVIC
Original Assignee
Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Aeronautics Computing Technique Research Institute of AVIC filed Critical Xian Aeronautics Computing Technique Research Institute of AVIC
Priority to CN202110243660.2A priority Critical patent/CN113050407B/en
Publication of CN113050407A publication Critical patent/CN113050407A/en
Application granted granted Critical
Publication of CN113050407B publication Critical patent/CN113050407B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B9/00Safety arrangements
    • G05B9/02Safety arrangements electric
    • G05B9/03Safety arrangements electric with multiple-channel loop, i.e. redundant control systems

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention discloses a method for determining and switching a distributed processing system master controller and a distributed processing system slave controller, which comprises the following steps: designing two system controllers in a distributed processing system, wherein the two system controllers work in a 1+1 hot backup mode, one system controller is a main controller, and the other system controller is a controller; configuring initial states of two system controllers, and setting up a default main controller and an occupation mark of the main controller; setting a control signal for controlling the switching of the main controller/the auxiliary controller, and configuring a switching strategy of the main controller/the auxiliary controller corresponding to different control signal states; and the two system controllers are controlled by the control signal and the switching strategy in the normal operation process in a 1+1 hot backup mode. In the scheme, the two controllers have fault-tolerant capability, and the system reliability is improved. The primary and standby identities of the two controllers are simply determined, and the switching mode is efficient.

Description

Method for determining and switching master controller and slave controller of distributed processing system
Technical Field
The invention belongs to the technical field of embedded computer system design.
Background
With the increasing complexity of embedded systems, the functional performance of the processing system is required to be improved, and the distributed processing system becomes a multifunctional and multitasking complex computer system.
In the prior art, the operation resources and the fault switching are generally realized by software discrimination, the fault discrimination and decision process is relatively complex, the consumed time is long, and the method has low efficiency and low real-time performance.
Disclosure of Invention
The invention provides a method for determining and switching a main controller and a slave controller of a distributed processing system, which is used for solving the problems of complex switching mode and low efficiency in the prior art.
In order to realize the task, the invention adopts the following technical scheme:
a distributed processing system host controller determining and switching method comprises the following steps:
designing two system controllers in a distributed processing system, wherein the two system controllers work in a 1+1 hot backup mode, one system controller is a main controller, and the other system controller is a controller;
configuring initial states of two system controllers, and setting up a default main controller and an occupation mark of the main controller;
setting a control signal for controlling the switching of the main controller/the auxiliary controller, and configuring a switching strategy of the main controller/the auxiliary controller corresponding to different control signal states; and the two system controllers are controlled by the control signal and the switching strategy in the normal operation process in a 1+1 hot backup mode.
Further, the configuring the initial states of the two system controllers and setting the default master controller and the occupation flag of the master controller includes:
when the distributed processing system is started, firstly initializing system hardware, and setting the initial states of two system controllers to be standby working modes;
the system carries out power-on test detection and records the detection result; the system reads the slot identification number of each system controller, the system controller with the slot identification number of 1 is set as a default main controller, and the system controller with the slot identification number of 2 is set as a controller; the occupancy flag set by the main controller is set to be valid.
Further, the control signal for controlling the switching of the main controller/the slave controller is a system fault signal and a network communication fault signal, wherein the system fault signal and the network communication fault signal respectively have different signal states.
Further, the signal states of the system fault signal and the network communication fault signal are valid and invalid.
Further, when the system controller with the slot identification number 1 is used as a main controller and the state of system fault signals or network communication faults is valid, the system controller with the slot identification number 2 is switched to be used as the main controller, and the occupation mark of the system controller with the slot identification number 2 is valid;
when the states of network communication fault signals of the two system controllers are effective, the distributed processing system enters an emergency working state, and only the two system controllers work;
when the states of the system fault signals of the two system controllers are valid, the distributed processing system enters a failure state.
Further, the conditions that trigger the active state of the system fault signal are: a watchdog overtime fault or a system management layer software fault in the distributed processing system; the rest time is in an invalid state;
the conditions that trigger the active state of the network communication failure are: a network communication failure of the distributed processing system.
Further, the method is stored in the form of a computer program in the memory of a computer; the computer comprises a processor and a memory, and the steps of the method for determining and switching the master controller and the slave controller of the distributed processing system are realized when the processor executes the computer program.
Further, the method is stored in a computer-readable storage medium in the form of a computer program; and when the computer program is executed by the processor, the steps of the method for determining and switching the main controller and the slave controller of the distributed processing system are realized.
Compared with the prior art, the invention has the following technical characteristics:
in the scheme, the main controller and the controller adopt a 1+1 hot backup mode, after the normal startup, the system management reads the identification number of the controller slot, the controller of the slot 1 is defaulted to operate as the main controller, the controller unit of the slot 2 is defaulted to operate as the controller, the occupation mark of the main controller is effective, and the main controller is switched after the fault occurs. The main controller has fault tolerance capability and controls and manages various software and hardware resources of the distributed processing system to work cooperatively. The fast switching and task recovery can be realized through the double controllers and by utilizing system fault signals and network communication fault signals; the two controllers have fault-tolerant capability, and the reliability of the system is improved; the primary and standby identities of the two controllers are simply determined, and the switching mode is efficient.
Drawings
FIG. 1 is a schematic diagram of the system structure of the method of the present invention.
The specific implementation mode is as follows:
various software and hardware resources and system tasks in the distributed processing system need a controller with a main and standby fault-tolerant mechanism to carry out cooperative work under unified control and management. The primary and secondary controllers are initially determined and switched when a fault occurs in operation, are key functions of the distributed processing system and are related to the reliability of system application task operation. The method has important significance for fast and efficient recovery of system work of a strong real-time system; in the scheme, quick switching and task recovery can be realized through the double controllers and by utilizing system fault signals and network communication fault signals.
The invention discloses a method for determining and switching a distributed processing system master controller and a distributed processing system slave controller, which comprises the following steps:
designing two system controllers in a distributed processing system, wherein the two system controllers work in a 1+1 hot backup mode, one system controller is a main controller, and the other system controller is a controller; configuring initial states of two system controllers, and setting up a default main controller and an occupation mark of the main controller; setting a control signal for controlling the switching of the main controller/the auxiliary controller, and configuring a switching strategy of the main controller/the auxiliary controller corresponding to different control signal states; and the two system controllers are controlled by the control signal and the switching strategy in the normal operation process in a 1+1 hot backup mode.
Wherein, the initial state of the two system controllers is configured, and the acquiescent main controller and the occupation mark of the main controller are set, including:
when the distributed processing system is started, firstly initializing system hardware, and setting the initial states of two system controllers to be standby working modes; the system carries out power-on test detection and records the detection result; the system reads the slot identification number of each system controller, the system controller with the slot identification number of 1 is set as a default main controller, and the system controller with the slot identification number of 2 is set as a controller; the occupancy flag set by the main controller is set to be valid.
The control signals for controlling the switching of the main controller/the backup controller are system fault signals and network communication fault signals, wherein the system fault signals and the network communication fault signals have different signal states respectively. The signal states of the system fault signal and the network communication fault signal are valid and invalid.
When the system controller with the slot identification number 1 is used as a main controller and has a system fault signal or a network communication fault state, the system controller with the slot identification number 2 is switched to be used as the main controller, and the occupation mark of the system controller with the slot identification number 2 is effective; when the states of network communication fault signals of the two system controllers are effective, the distributed processing system enters an emergency working state, and only the two system controllers work; when the states of the system fault signals of the two system controllers are valid, the distributed processing system enters a failure state.
The conditions that trigger the active state of the system fault signal are: a watchdog overtime fault or a system management layer software fault in the distributed processing system; the rest time is in an invalid state;
the conditions that trigger the active state of the network communication failure are: a network communication failure of the distributed processing system.
The method comprises the following implementation steps:
the distributed processing system is provided with two system controllers which work in a 1+1 hot backup mode. When the processing system is started, the hardware is initialized, and the initial states of the two controllers are standby working modes. The system carries out PUBIT detection and records a detection result, the system manages and reads a controller slot identification number, a controller 1 of a slot 1 is set to be a main controller to operate, a controller 2 of a slot 2 is set to be a controller to operate, and an occupancy flag CONFLAG set by the controller 1 is valid. The two controllers are controlled by a system fault signal SYSFAIL and a network communication fault signal FCFAIL in the normal operation process in a 1+1 hot backup mode. When the controller 1 appears as a master controller that the SYSFAIL signal or FCFAIL signal is valid, the master controller switches to the controller 2, and the controller 2 sets the occupancy flag config to be valid. When the FCFAIL signals of the two controllers are effective, the network communication is failed, the processing system enters an emergency working state, and only the controllers work. When SYSFAIL signals of the two controllers are valid, the two controllers are both in failure, and the processing system enters a failure state. The conditions that trigger the syshair signal to be active are: watchdog timeout failures (passive, hardware monitoring software running away) or software set failures (active). The conditions for triggering the FCFAIL signal to be active are: a network communication failure.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equally replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application, and are intended to be included within the scope of the present application.

Claims (8)

1. A distributed processing system host controller determining and switching method is characterized by comprising the following steps:
designing two system controllers in a distributed processing system, wherein the two system controllers work in a 1+1 hot backup mode, one system controller is a main controller, and the other system controller is a controller;
configuring initial states of two system controllers, and setting up a default main controller and an occupation mark of the main controller;
setting a control signal for controlling the switching of the main controller/the auxiliary controller, and configuring a switching strategy of the main controller/the auxiliary controller corresponding to different control signal states; and the two system controllers are controlled by the control signal and the switching strategy in the normal operation process in a 1+1 hot backup mode.
2. The distributed processing system master-slave determination and switching method according to claim 1, wherein configuring initial states of two system controllers and setting default master controllers and master controller ownership flags comprises:
when the distributed processing system is started, firstly initializing system hardware, and setting the initial states of two system controllers to be standby working modes;
the system carries out power-on test detection and records the detection result; the system reads the slot identification number of each system controller, the system controller with the slot identification number of 1 is set as a default main controller, and the system controller with the slot identification number of 2 is set as a controller; the occupancy flag set by the main controller is set to be valid.
3. The distributed processing system master controller and slave controller determining and switching method according to claim 1, wherein the control signals for controlling the master controller/slave controller switching are a system fault signal and a network communication fault signal, wherein the system fault signal and the network communication fault signal respectively have different signal states.
4. The distributed processing system master and slave determination and switchover method of claim 1 wherein the signal status of the system fault signal, the network communication fault signal is valid and invalid.
5. The method for determining and switching master controllers and slave controllers of a distributed processing system according to claim 1, wherein when the system controller with slot identification number 1 is valid as the master controller in a state where a system fault signal or a network communication fault occurs, the system controller with slot identification number 2 is switched as the master controller, and the occupation flag of the system controller with slot identification number 2 is valid;
when the states of network communication fault signals of the two system controllers are effective, the distributed processing system enters an emergency working state, and only the two system controllers work;
when the states of the system fault signals of the two system controllers are valid, the distributed processing system enters a failure state.
6. The distributed processing system master-slave determination and switching method of claim 1, wherein:
the conditions that trigger the active state of the system fault signal are: a watchdog overtime fault or a system management layer software fault in the distributed processing system; the rest time is in an invalid state;
the conditions that trigger the active state of the network communication failure are: a network communication failure of the distributed processing system.
7. The distributed processing system host-controller determination and switchover method of claim 1 wherein the method is stored in a memory of a computer in the form of a computer program; the computer comprises a processor, a memory, and a computer program which, when executed by the processor, performs the steps of the method according to any one of claims 1 to 6.
8. The distributed processing system host-controller determination and switching method of claim 1, wherein the method is stored in a computer-readable storage medium in the form of a computer program; the computer program, when being executed by a processor, is adapted to carry out the steps of the method according to any one of claims 1 to 6.
CN202110243660.2A 2021-03-04 2021-03-04 Method for determining and switching master controller and slave controller of distributed processing system Active CN113050407B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110243660.2A CN113050407B (en) 2021-03-04 2021-03-04 Method for determining and switching master controller and slave controller of distributed processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110243660.2A CN113050407B (en) 2021-03-04 2021-03-04 Method for determining and switching master controller and slave controller of distributed processing system

Publications (2)

Publication Number Publication Date
CN113050407A true CN113050407A (en) 2021-06-29
CN113050407B CN113050407B (en) 2022-11-22

Family

ID=76510225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110243660.2A Active CN113050407B (en) 2021-03-04 2021-03-04 Method for determining and switching master controller and slave controller of distributed processing system

Country Status (1)

Country Link
CN (1) CN113050407B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114484766A (en) * 2021-12-21 2022-05-13 珠海格力电器股份有限公司 Method for determining master controller and related equipment

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20020064509A (en) * 2001-02-02 2002-08-09 두산중공업 주식회사 Hot back-up device for double excitation system
CN1968075A (en) * 2006-05-23 2007-05-23 华为技术有限公司 Distributed hot-standby logic device and primary/standby board setting method
CN101030073A (en) * 2007-03-30 2007-09-05 哈尔滨工程大学 Switch circuit for engine redundant electrically-controlled system and its controlling method
CN101430550A (en) * 2007-03-30 2009-05-13 哈尔滨工程大学 Switch control method of engine redundancy electric-control system
CN102541697A (en) * 2010-12-31 2012-07-04 中国航空工业集团公司第六三一研究所 Switching method for processing fault of dual-redundancy computer
CN102799104A (en) * 2012-07-02 2012-11-28 浙江正泰中自控制工程有限公司 Safety control redundant system and method for fully-intelligent master control system
CN106444685A (en) * 2016-12-06 2017-02-22 中国船舶重工集团公司第七〇九研究所 Distributed control system and method of distributed control system for dynamic scheduling resources
CN107733684A (en) * 2017-08-31 2018-02-23 北京宇航系统工程研究所 A kind of multi-controller computing redundancy cluster based on Loongson processor
CN108803560A (en) * 2018-05-03 2018-11-13 南京航空航天大学 Synthesization DC solid-state power controller and failure decision diagnostic method
CN110677282A (en) * 2019-09-23 2020-01-10 天津津航计算技术研究所 Hot backup method of distributed system and distributed system
CN112130448A (en) * 2020-09-25 2020-12-25 北京交大思诺科技股份有限公司 Method for switching between main and standby machines

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20020064509A (en) * 2001-02-02 2002-08-09 두산중공업 주식회사 Hot back-up device for double excitation system
CN1968075A (en) * 2006-05-23 2007-05-23 华为技术有限公司 Distributed hot-standby logic device and primary/standby board setting method
CN101030073A (en) * 2007-03-30 2007-09-05 哈尔滨工程大学 Switch circuit for engine redundant electrically-controlled system and its controlling method
CN101430550A (en) * 2007-03-30 2009-05-13 哈尔滨工程大学 Switch control method of engine redundancy electric-control system
CN102541697A (en) * 2010-12-31 2012-07-04 中国航空工业集团公司第六三一研究所 Switching method for processing fault of dual-redundancy computer
CN102799104A (en) * 2012-07-02 2012-11-28 浙江正泰中自控制工程有限公司 Safety control redundant system and method for fully-intelligent master control system
CN106444685A (en) * 2016-12-06 2017-02-22 中国船舶重工集团公司第七〇九研究所 Distributed control system and method of distributed control system for dynamic scheduling resources
CN107733684A (en) * 2017-08-31 2018-02-23 北京宇航系统工程研究所 A kind of multi-controller computing redundancy cluster based on Loongson processor
CN108803560A (en) * 2018-05-03 2018-11-13 南京航空航天大学 Synthesization DC solid-state power controller and failure decision diagnostic method
CN110677282A (en) * 2019-09-23 2020-01-10 天津津航计算技术研究所 Hot backup method of distributed system and distributed system
CN112130448A (en) * 2020-09-25 2020-12-25 北京交大思诺科技股份有限公司 Method for switching between main and standby machines

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114484766A (en) * 2021-12-21 2022-05-13 珠海格力电器股份有限公司 Method for determining master controller and related equipment

Also Published As

Publication number Publication date
CN113050407B (en) 2022-11-22

Similar Documents

Publication Publication Date Title
US20190303255A1 (en) Cluster availability management
US8032786B2 (en) Information-processing equipment and system therefor with switching control for switchover operation
CN110427283B (en) Dual-redundancy fuel management computer system
CN103853622A (en) Control method of dual redundancies capable of being backed up mutually
US20040177242A1 (en) Dynamic computer system reset architecture
CN113050407B (en) Method for determining and switching master controller and slave controller of distributed processing system
CN114337944B (en) System-level main/standby redundancy general control method
CN101557307B (en) Dispatch automation system application state management method
CN110764829B (en) Multi-path server CPU isolation method and system
JP5285045B2 (en) Failure recovery method, server and program in virtual environment
CN101686261A (en) RAC-based redundant server system
CN101770211B (en) Vehicle integrated data processing method capable of realizing real-time failure switching
JP5285044B2 (en) Cluster system recovery method, server, and program
CN103297279A (en) Switching method of main and backup single disks of software control in multi-software process system
JP2008152552A (en) Computer system and failure information management method
CN110677288A (en) Edge computing system and method generally used for multi-scene deployment
JP2014048933A (en) Plant monitoring system, plant monitoring method, and plant monitoring program
CN114138567A (en) Substrate management control module maintenance method, device, equipment and storage medium
CN110752955A (en) Seat invariant fault migration system and method
CN113742165B (en) Dual master control equipment and master-slave control method
JP5913003B2 (en) Computer control apparatus, method and program
KR0168947B1 (en) Method for booting node without disk in real-time distributing system
CN116881053B (en) Data processing method, exchange board, data processing system and data processing device
CN113741248B (en) Edge calculation controller and control system
JPH10133963A (en) Fault detecting and recovering system for computer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant