CN106201981B - A kind of near space ship load computer multi-CPU system self-adapting reconstruction method - Google Patents

A kind of near space ship load computer multi-CPU system self-adapting reconstruction method Download PDF

Info

Publication number
CN106201981B
CN106201981B CN201610475340.9A CN201610475340A CN106201981B CN 106201981 B CN106201981 B CN 106201981B CN 201610475340 A CN201610475340 A CN 201610475340A CN 106201981 B CN106201981 B CN 106201981B
Authority
CN
China
Prior art keywords
abnormity
heartbeat
frequency
subsystem
house dog
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610475340.9A
Other languages
Chinese (zh)
Other versions
CN106201981A (en
Inventor
吕达
颜坤
汪青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Near Space Airship Technology Development Co Ltd
Original Assignee
Beijing Near Space Airship Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Near Space Airship Technology Development Co Ltd filed Critical Beijing Near Space Airship Technology Development Co Ltd
Priority to CN201610475340.9A priority Critical patent/CN106201981B/en
Publication of CN106201981A publication Critical patent/CN106201981A/en
Application granted granted Critical
Publication of CN106201981B publication Critical patent/CN106201981B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/167Interprocessor communication using a common memory, e.g. mailbox
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • G06F15/7871Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS
    • G06F15/7882Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS for self reconfiguration

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Hardware Redundancy (AREA)

Abstract

The present invention provides a kind of near space ships to carry computer multi-CPU system self-adapting reconstruction method, and the multi-CPU system includes two identical subsystems: primary subsystem and backup subsystem;The primary subsystem includes: central processing unit A0, RS422 interface unit A1 and 1553B expanding element A2;The backup subsystem includes: central processing unit B0, RS422 interface unit B1 and 1553B expanding element B2;The described method includes: each unit of primary subsystem and backup subsystem that step 1) near space ship carries computer multi-CPU system establishes communication connection;Step 2) utilizes message mailbox and shared drive between CPU, inside primary subsystem, inside backup subsystem and between active and standby subsystem using house dog and heartbeat combine in such a way that, detect the cell failure of each subsystem, foundation health matrix;The active system of multi-CPU system is reconstructed in the high principle of the priority of each unit of the step 3) based on primary subsystem, the healthy matrix based on step 2).

Description

A kind of near space ship load computer multi-CPU system self-adapting reconstruction method
Technical field
The present invention is suitable near space ship load computer field more particularly to a kind of near space ship load computer is more Cpu system self-adapting reconstruction method.
Background technique
Stratospheric airship is a kind of aerostatics for being lighter than air, stays sky by aerostatic buoyancy, can be with long-term suspension in any The stratosphere high-altitude in geographical location, round-the-clock continuous work are particularly suitable for the earth observation application on China territory and border area. Ship carries the brain that computer is equivalent to dirigible, as the comprehensive administration center of airship platform, by Navigation Control, environmental Kuznets Curves, number It passes and forms an organic whole with antenna, remote measuring and controlling and the energy and each subsystem of propulsion, complete mission planning requirement Aerial mission, load task, in time acquisition and down pass telemetry, load information, execute the telecommand of ground control station, It is aerial and ground command system connection hinge.Therefore, the performance that ship carries computer directly influences airship platform and completes to appoint The performance superiority and inferiority of business, reliability determine the survival ability of dirigible.Stratospheric airship is stayed the characteristics of empty execution task for a long time, is increased The probability for the mistake that breaks down during having added ship to carry computer execution task, in order to guarantee flight safety, it is desirable that ship carries computer Should have perfect in shape and function, excellent performance, high stability and reliability index.
In order to improve the reliability that ship carries computer, relying solely on improving the quality of component and component and improve assembler Skill quality is extremely difficult to require, thus the architecture for carrying computer from ship is needed to start with, and using Redundancy Design, and is equipped with properly Faults-tolerant control strategy, to guarantee the correct progress of flight management and control.
The selection that ship carries MATLAB software scheme is limited by volume, power consumption, can neither be excessively complicated, and must assure that Certain reliability.In airborne or spaceborne computer fault-tolerant networks, common redundant fashion has: two remainings, triplex redundance and Four remainings.The F16 fighter plane in the U.S. uses four remaining digital computing systems, famous high altitude long time reconnaissance UAV " global hawk " uses duplication redundancy flight-control computer, and high-altitude, which is detectd, beats integrated unmanned plane " predator " using triplex redundance Flight-control computer system is (referring to document [1] Greg Loegering, David Evans.The evolution of the global hawk&mald avionics systems.Proceeding of the 18th Digital Avionics Systems Conference, 1999,10 (2): 24~29;Document [2] Richard R.Grafton.Predator unmanned air vehicle system.SPIE Confernce on airborne reconnaissance XXIII, 1999,7:53~63;Document [3] Stephen A Cambone, Kenneth J Krieg, Peter Pace, et al.Unmanned Aircraft Systems(UAS)roadmap.Office of the Secretary of Defense, 2005:1~77.)
Harbin Institute of Technology " test satellite No.1 " spaceborne computer selected double redundancy system (referring to document [4]: to Faults-tolerant control strategy study [J] aerospace journal of the macro Cui Gang poplar filial piety ancestor's Mechanisms of House-keeping Computer System for The Small Satellite of beautiful jade Wu Xiang tiger Liao Ming .2005.27 (3): 400-403), the Chinese Academy of Sciences " innovation No.1 " satellite carried computer uses two-node cluster hot backup control strategy (ginseng See that document [5] grandson Wang Ping Ning Lihuawangbaohai surpasses Yin's once mountain Small Satellite fault-tolerant computer system design of hardware and software [J] aerospace Journal .2006.27 (3): 412-415).Nanjing Aero-Space University develops unmanned plane distribution double redundancy flight control computer (referring to document [6] Zhang Zhi text binary channels redundance flight control systems computer key technology research, the Nanjing [master thesis]: Nanjing Aero-Space University, 2012);Since reliability of the three moulds voting strategy in short-life application system is high, and In the system of long-life, reliability is less than dual redundant system (referring to document [7] to beautiful jade song peak Cui Gang poplar filial piety ancestor's small satellite satellite The fault-tolerance architecture of business computer designs 2005.23 (2) 92-95 of [J] Aerospace Control);It is counted so being carried in stratospheric airship ship Dual redundant scheme has been selected in calculation machine remaining scheme.It is mutually backups using A, B machine, uses Powerpc processor.
Ship is carried computer and is designed using symmetrical structure, includes two central processing unit (CPUA0/CPUB0), dual serials Expanding element (CPUA1/CPUB1) and two 1553B expanding elements (CPUA2/CPUB2), identical functional unit have identical Hardware configuration, use similar redundancy, all kinds of communication unit interfaces are all using shaking hands working method.Two central processing unit Respectively include two-way CAN interface, 5 road RS422 interfaces and DI and DO interface, is responsible for Flight Control Law resolving, data management, distant The functions such as remote control and load management are surveyed, are the main control unit of system.Dual serial mouth expanding element respectively includes 5 road RS422 buses, As serial line interface expanding element, it is responsible for the acquisition transmitted with the input/output informations of RS422 interface peripherals, including information, pre- Processing and output are the slave unit of system.Two 1553B expanding elements respectively include two-way 1553B interface, responsible and 1553B The input/output information of interface external node transmits, acquisition, pretreatment and output including information, also as the subordinate list of system Member.Ship load computer is interacted with other ships load device data carries computer interface information exchange system by structure ship to complete.Its Communication process specifically: two central processing unit determine which central processing unit is main central processing list by identification Member carries out redundancy management in such a way that shared drive, message mailbox and data/address bus combine, and main central processing unit passes through CAN/1553B bus receives the upstream data of navigation subsystem, atmospheric environment subsystem and energy resource system, then carries out flight control System rule resolves, management control and independent navigation function, main central processing unit possess the right of downlink data, passes through CAN/ RS422 bus downlink data completes the control to dynamical system propeller.It is each as shown in Figure 1: that ship carries computer hardware structure Ellipse representation shared drive between CPU, four-headed arrow indicate message mailbox.
The monitoring to other side's machine operating status is usually realized using the method for heartbeat in redundancy design, as shown in Fig. 2, arrow The CPU of direction is monitoring CPU.When the heartbeat signal between two-shipper breaks down, illustrates that failure occurs in system, need to carry out Fault diagnosis.When not receiving the heartbeat signal of other side's machine, in fact it could happen that failure factor have following two: 1) host or standby There is failure in machine itself software or hardware;2) there is failure in heartbeat.How to go which kind of reason judgement is caused by actually Failure is relatively difficult, and especially when heartbeat itself breaks down, at this time host and standby host are regarded as other side's appearance Other side's machine-cut can be gone out dual systems by failure, host, and into single machine working condition, and standby host will be attempted to take over the work of host Make, make oneself to enter working condition, there is two controllers while in running order trouble waters in control system at this time, There is so-called double host phenomenons (document [8]: the Dual-Computer Hot-Standby System of embedded controller of the Cao Ming great waves based on VxWorks Using the Nanjing [master thesis]: Nanjing Univ. of Posts and Telecommunications, 2012).
Redundancy management is the key that ship carries Computer Design, guarantees that ship load computer being capable of real-time detection when encountering failure It is out of order, is diagnosed to be fault type, to troubleshooting and complete system reconfiguration.Reasonable redundancy management can monitor failure in real time Generation, when system jam, the performance loss of system is reduced to minimum, to guarantee flight safety, so remaining pipe The design of reason is particularly important.Computer hardware configuration feature is carried according to ship, reasonable redundancy management strategy is designed, needs to design Content includes: fault detection and diagnosis, system reconfiguration and recovery.
Document at present about near space ship load computer is less, and relatively common is spaceborne, airborne computer, remaining Structure also includes double redundancy, triplex redundance etc..Since reliability of the three mould Voting Schemes in short-life application system is high, And in the system of long-life, reliability is less than dual redundant system.Since stratospheric airship stays empty requirement, some months for a long time Even half a year, so ship carries computer and selects dual-redundancy structure more particularly suitable.
In double redundancy computer at present to take system-level redundancy, reliability is lower than modularity redundancy more.Under default situations, Host work, standby host pass through the working condition of heartbeat detection host, if detecting, host breaks down, and standby host becomes new master Machine works on.Due to carrying out fault detection using heartbeat mode, when heartbeat damage, can not position be hostdown also It is heartbeat failure, even if taking redundancy heartbeat, there is also the risks of homomorphism failure.In addition, if A machine failure, B machine adapter tube work After work, there is failure again in some unit of B machine, and whole system will be unable to work normally.
The double redundancy computer of modularity redundancy uses board plug type structure at present, and core component generally comprises central processing list Member and serial communication unit, switch interface unit etc., each functional unit is all inserted on bottom plate, is carried out by the internal bus of bottom plate Communication.Central processing unit is merely able to realize fault detection to other units by internal bus data, and detection method is single, once Internal bus damage, then lead to not detect, therefore reliability is not high, even if internal bus uses redundancy structure, but there are still same The risk of state failure.When some unit breaks down, system reconfiguration, stand-by unit takes over trouble unit, and original related Unit composition system works on, but when internal bus damage, leads to systemic breakdown.
Summary of the invention
It is an object of the invention to overcome current ship to carry computer system in the above problem present on redundancy management, propose A kind of near space ship carries computer multi-CPU system self-adapting reconstruction method, and this method is based on central processing unit (A0/B0) There are shared drive and message mailbox between RS422 interface unit (A1/B1), 1553B expanding element (A2/B2), by " seeing Door dog " adds the isomery mode of " heartbeat " to carry out CPU mutually to detect, make fault location to cell level;Then by the list in main preparation system Member is combined, to realize the reconstruct of system.
To achieve the goals above, computer multi-CPU system is carried the present invention provides a kind of near space ship adaptively to weigh Structure method, the multi-CPU system include two identical subsystems: primary subsystem and backup subsystem;Described primary point is System includes: central processing unit A0, RS422 interface unit A1 and 1553B expanding element A2;During the backup subsystem includes: Central Processing Unit B0, RS422 interface unit B1 and 1553B expanding element B2;The described method includes:
Each unit of primary subsystem and backup subsystem that step 1) space ship carries computer multi-CPU system is established logical Letter connection;
Step 2) utilizes message mailbox and shared drive between CPU, inside primary subsystem, inside standby system and main Between backup subsystem in such a way that house dog and heartbeat combine, the cell failure of each subsystem is detected, establishes healthy square Battle array;
The high principle of the priority of each unit of the step 3) based on primary subsystem, based on the healthy matrix of step 2) to more The active system of cpu system is reconstructed.
In above-mentioned technical proposal, the step 1) is specifically included:
Step 101) A0 is initialized as TCP server, and B1, B2 are initialized as TCP Client;A0 is by network interface with TCP/IP Mode and B1, B2 establish connection, while being shaken hands by message mailbox and A1, A2 completion;
Step 102) B0 is initialized as TCP server, and A1, A2 are initialized as TCP Client;B0 is by network interface with TCP/IP Mode and A1, A2 establish connection, while being shaken hands by message mailbox and B1, B2 completion;
Without sequencing between the step 101) and step 102).
In above-mentioned technical proposal, the step 2) includes:
For step 201) in the test initialization stage, A0 detects A1 and A2, and B0 detects B1 and B2;A0 and B0 realizes mutually detection;
Step 202) uses house dog and the heart inside primary subsystem, inside backup subsystem and between active and standby subsystem The mode combined is jumped, the cell failure of each subsystem is detected;
Step 203) establishes healthy matrix according to the testing result of step 202);
The healthy matrix of testing result formation multi-CPU system:
A in matrix X0、a1、a2Respectively indicate the health status of A0, A1, A2, value be 0 or 1,0 be it is normal, 1 is failure; B in matrix0、b1、b2Respectively indicate B0, B1, B2 health status value be 0 or 1,0 be it is normal, 1 is failure.
In above-mentioned technical proposal, the step 202) is specifically included:
Step 202-1) when reaching detection cycle, A0 sends heartbeat signal and watchdog signal to B0, and then detection is The no heartbeat signal for receiving B0/A1/A2, if receiving heartbeat signal, heartbeat frequency of abnormity is reset, otherwise corresponding heartbeat frequency of abnormity +1;The watchdog signal for detecting whether to receive B0/A1/A2 later, if receiving watchdog signal, house dog frequency of abnormity is reset, Otherwise corresponding house dog frequency of abnormity+1;When B0/A1/A2 heartbeat frequency of abnormity is unsatisfactory for > 2, it is believed that B0/A1/A2 is normal;When When B0/A1/A2 heartbeat frequency of abnormity > 2, house dog frequency of abnormity is checked, if house dog frequency of abnormity > 2, determine B0/A1/A2 It breaks down, sets B0/A1/A2 Reflector, otherwise determine the heartbeat damage between A0 and B0/A1/A2;
Step 202-2) when reaching detection cycle, B0 sends heartbeat signal and watchdog signal to A0, and then detection is The no heartbeat signal for receiving A0/B1/B2, if receiving heartbeat signal, beats are reset, otherwise corresponding heartbeat frequency of abnormity+1; The watchdog signal for detecting whether to receive A0/B1/B2 later, if receiving watchdog signal, house dog frequency of abnormity is reset, otherwise Corresponding house dog frequency of abnormity+1;When A0/B1/B2 heartbeat frequency of abnormity is unsatisfactory for > 2, it is believed that A0/B1/B2 is normal;Work as A0/ When B1/B2 heartbeat frequency of abnormity > 2, house dog frequency of abnormity is checked, if house dog frequency of abnormity > 2, determine that A0/B1/B2 is sent out Raw failure, sets A0/B1/B2 Reflector, otherwise determines the heartbeat damage between B0 and A0/B1/B2;
Above-mentioned steps 202-1) and step 202-2) between without sequencing.
In above-mentioned technical proposal, the step 3) is specifically included:
The high principle of the priority of each unit of the step 301) based on primary subsystem, in conjunction with priority and healthy matrix X Obtain the matrix W on duty of current system;
W in matrix W on dutya0、wa1、wa2The respectively state on duty of CPUA0, CPUA1, CPUA2, value are 0 or 1, and 0 is On duty, 1 is not on duty;wb0、wb1、wb2The respectively state on duty of CPUB0, CPUB1, CPUB2, value are 0 or 1,0 to be on duty, 1 is not on duty;
Step 302) selects the unit in state on duty to form primary subsystem from matrix on duty, realizes the weight of system Structure.
Present invention has an advantage that by means of the present invention, ship carries computer can be according to predetermined redundancy management strategy Carry out corresponding failure processing, guarantee ship carry computer when encountering failure can real-time detection be out of order, diagnose fault type, and at Function completes the reconstruct and recovery of system, improves whole system reliability and completeness.
Detailed description of the invention
Fig. 1 is that ship carries computer hardware structure figure;
Fig. 2 is that ship load computer CPU mutually detects relational graph;
Fig. 3 is that ship carries computer system initialization flowchart;
Fig. 4 is that ship carries computer multi-CPU system fault detection flow chart;
Fig. 5 is that ship carries computer multi -CPU parallel processing system (PPS) reconstruct image;
Fig. 6 is the work flow diagram that ship of the invention carries computer system;
Fig. 7 is that ship carries computer multi-CPU system emulation schematic diagram;
Fig. 8 is computer function module level dual redundant topology diagram;
Fig. 9 is system-level dual redundant topology diagram;
Figure 10 is that ship carries computer module grade redundancy and system-level redundancy reliability figure.
Specific embodiment
The present invention is described in further detail in the following with reference to the drawings and specific embodiments.
Ship is carried computer and is designed using symmetrical structure, includes two central processing unit (CPUA0/CPUB0), dual serials Expanding element (CPUA1/CPUB1), two 1553B expanding elements (CPUA2/CPUB2) and two power modules, identical function Energy unit hardware configuration having the same uses similar redundancy, and all kinds of communication unit interfaces are all using working method of shaking hands.Two A central processing unit respectively includes two-way CAN interface, 5 road RS422 interfaces and DI and DO interface, is responsible for Flight Control Law solution The functions such as calculation, data management, remote measuring and controlling and load management are the main control unit of system.Dual serial mouth expanding element respectively wraps Containing 5 road RS422 buses, as serial line interface expanding element, it is responsible for transmitting with the input/output information of RS422 interface peripherals, packet The acquisition, pretreatment and output of information are included, is the slave unit of system.Two 1553B expanding elements respectively include that two-way 1553B connects Mouthful, it is responsible for transmitting with the input/output information of 1553B interface external node, acquisition, pretreatment and output including information are also made For the slave unit of system.Ship carries computer and interacts with other ships load device data by structure ship load computer interface information exchange System is completed.Its communication process specifically: which central processing unit is two central processing unit determine by identification For main central processing unit, carry out redundancy management in such a way that shared drive, message mailbox and data/address bus combine, it is main in Central Processing Unit receives the upstream data of navigation subsystem, atmospheric environment subsystem and energy resource system by CAN/1553B bus, Then Flight Control Law resolving, management control and independent navigation function, main central processing unit are carried out and possesses the power of downlink data Benefit completes the control to dynamical system propeller by CAN/RS422 bus downlink data.Ship carries computer hardware structure such as Shown in Fig. 1: the ellipse representation shared drive between each CPU, four-headed arrow indicate message mailbox.
A kind of near space shipboard computer multi-CPU system self-adapting reconstruction method, which comprises
Each unit that step 1) ship carries computer multi-CPU system carries out active and standby identification, establishes communication connection;
As shown in figure 3, the step 1) specifically includes:
Two central processing unit of step 101) realize active/standby identification, default A by specifying address reading data For host, B is standby host;
Step 102) A0 establishes connection by network interface in a manner of TCP/IP with B1, B2, while passing through message mailbox and A1, A2 Completion is shaken hands, and realizes the initialization of RS422 and 1553B;
Detailed process is as follows: A0 is initialized as TCP server, and B1, B2 are initialized as TCP Client, and A0 and B1, B2 are established Connection, can be communicated by TCP/IP mode.Meanwhile A0 sends id information to A1, A2 by message mailbox, A1, A2 receive ID letter After breath, it is respectively completed the initialization to RS422 interface and 1553B bus, the affiliated 1553B of A2 is initialized as bus control unit BC, Later, A1, A2 send feedback information to A0, after A0 is received, complete shaking hands between A1, A2.
Step 103) B0 establishes connection by network interface in a manner of TCP/IP with A1, A2, while passing through message mailbox and B1, B2 Completion is shaken hands, and realizes the initialization of RS422 and 1553B;
Detailed process is as follows: B0 is initialized as TCP server, and A1, A2 are initialized as TCP Client, and B0 and A1, A2 are established Connection, can be communicated by TCP/IP mode.Meanwhile B0 sends id information to B1, B2 by message mailbox, B1, B2 receive ID letter After breath, it is respectively completed the initialization to RS422 interface and 1553B bus, the affiliated 1553B of B2 is initialized as BM, later, B1, B2 Feedback information is sent to B0, after B0 is received, completes shaking hands between B1, B2.
Step 2) utilize CPU between message mailbox and shared drive, inside primary subsystem, inside backup subsystem, The detection method combined between active and standby subsystem using house dog and heartbeat detects failure, establishes healthy matrix;Specifically Include:
Step 201) detects respective serial ports expansion list in test initialization stage, central processing unit (A0/B0) respectively First (A1/B1) and 1553B expanding element (A2/B2), two central processing unit (A0/B0) realize mutually detection;
Step 202) is detected CPU and sends heartbeat signal to detection CPU with the fixed cycle by message mailbox, is detected simultaneously It surveys CPU and specific data is periodically entered to detection CPU write by shared drive, when detection CPU can receive the heartbeat letter of detected CPU Number, when can also receive the watchdog signal of detected CPU, illustrate be detected CPU it is working properly, when detection CPU can not receive it is tested When surveying CPU heartbeat signal but watchdog signal can be received, judge that heartbeat damages;When detection CPU both can not receive detected CPU's Heartbeat signal, and when can not receive detected CPU watchdog signal, illustrate to be detected CPU damage, execute system switching, standby host connects For host work;
By using heartbeat and house dog isomery backup detection method, operational state of mainframe is judged, under any circumstance It will not occur accidentally to switch or double Host Status occur.
As shown in figure 4, the step 202) specifically includes:
Step 202-1) when reaching detection cycle, A0 sends heartbeat signal and watchdog signal to B0, and then detection is The no heartbeat signal for receiving B0/A1/A2, if receiving heartbeat signal, heartbeat frequency of abnormity is reset, otherwise corresponding heartbeat frequency of abnormity +1;The watchdog signal for detecting whether to receive B0/A1/A2 later, if receiving watchdog signal, house dog frequency of abnormity is reset, Otherwise corresponding house dog frequency of abnormity+1;When B0/A1/A2 heartbeat frequency of abnormity is unsatisfactory for > 2, it is believed that B0/A1/A2 is normal.When When B0/A1/A2 heartbeat frequency of abnormity > 2, house dog frequency of abnormity is checked, if house dog frequency of abnormity > 2, determine B0/A1/A2 It breaks down, sets B0/A1/A2 Reflector, otherwise determine the heartbeat damage between A0 and B0/A1/A2;
Step 202-2) when reaching detection cycle, B0 sends heartbeat signal and watchdog signal to A0, and then detection is The no heartbeat signal for receiving A0/B1/B2, if receiving heartbeat abnormal signal, beats are reset, otherwise corresponding heartbeat frequency of abnormity +1;The watchdog signal for detecting whether to receive A0/B1/B2 later, if receiving watchdog signal, house dog frequency of abnormity is reset, Otherwise corresponding house dog frequency of abnormity+1;When A0/B1/B2 heartbeat frequency of abnormity is unsatisfactory for > 2, it is believed that A0/B1/B2 is normal.When When A0/B1/B2 heartbeat frequency of abnormity > 2, house dog frequency of abnormity is checked, if house dog frequency of abnormity > 2, determine A0/B1/B2 It breaks down, sets A0/B1/B2 Reflector, otherwise determine the heartbeat damage between B0 and A0/B1/B2.
Above-mentioned steps 202-1) and step 202-2) between without sequencing.
Step 203) establishes healthy matrix according to the testing result of step 202);
The healthy matrix of testing result formation multi-CPU system:
A in matrix X0、a1、a2Respectively indicate the health status of CPUA0, CPUA1, CPUA2, value be 0 or 1,0 be it is normal, 1 is failure;B in matrix0、b1、b2Respectively indicate CPUB0, CPUB1, CPUB2 health status value be 0 or 1,0 be it is normal, 1 For failure.
Multi-CPU system is reconstructed in the healthy matrix that step 3) is based on step 2);
As shown in figure 5, ship carries computer by the way of reconstructing on logical meaning, in hardware configuration and network topology structure It in the case where constant, when an error occurs, detect failure and completes to diagnose, and carry out system reconfiguration, realize seamless switching, allow System works normally again.
The priority of step 301) multi-CPU system is A > B, in conjunction with priority and the health available current system of matrix X Matrix W on duty;
W in matrix W on dutya0、wa1、wa2The respectively state on duty of CPUA0, CPUA1, CPUA2, value are 0 or 1, and 0 is On duty, 1 is not on duty;wb0、wb1、wb2The respectively state on duty of CPUB0, CPUB1, CPUB2, value are 0 or 1,0 to be on duty, 1 is not on duty.
Step 302) is reconstructed system according to matrix on duty.
Central processing unit (A0/B0) judges oneself hosting work whether on duty according to state corresponding in matrix on duty; Central processing unit (A0/B0) on duty according to the state of matrix on duty select corresponding RS422 interface unit (A1/B1) and 1553B expanding element (A2/B2) forms multi-CPU system, works on, realizes the reconstruct of system.
When central processing unit A0 needs to send data by RS422, module on duty is selected to be counted from A1/B1 According to transmission;When central processing unit A0 needs to send data by 1553B, module on duty is selected to be counted from A2/B2 According to transmission.So central processing unit can select CPUB1/CPUB2 to be replaced at once when CPUA1/CPUA2 breaks down In generation, realizes seamless switching.Central processing unit B0 periodically inquires the health status of inquiry A0, once A0 breaks down, B0 adapter system, seamless switching realize system reconfiguration.As shown in Figure 6.During B0 is on duty, if interface 1553B extension is single Member damage or the damage of RS422 expanding element, carry out ruling according to healthy matrix, final system working state figure is as shown in Figure 7.
Ship carries computer and uses not reciprocity two-shipper scheme, and A machine is run in a manner of main controller, and B machine is transported in a manner of backup machine It goes, the data interaction between the two is also not reciprocity.When A machine is on duty, B machine does not export control, and A machine can be slapped completely at this time The operating status of dirigible is held, and backup machine is not involved in system control, A machine transmits the machine to B machine in due course due to being in standby The working condition of working condition and dirigible other function unit guarantees that the feelings such as switch between master and slave machines or main controller failure once occur Shape, B machine can go on smoothly in-situ FTIR spectroelectrochemitry, to guarantee the continual and steady operation of dirigible.B machine sends A machine the machine working condition to Data so that ground will be seen that backup machine whether normal operation.
Fault simulation verifying
For the holding water property for verifying the redundancy management strategy that method of the invention designs, computer is carried by ship, is imitated in real time Prototype and Fault Insertion Equipment form semi-physical emulation platform, carry out simulating, verifying.It is carried by direct fault location simulation software to ship Computer injects various types failure, including bus node failure, input/output interface failure, central processing unit failure etc..
Test result shows that ship carries computer and can carry out corresponding failure processing according to predetermined redundancy management strategy, guarantees Ship carry computer when encountering failure can real-time detection be out of order, diagnose fault type, and successfully complete the reconstruct of system with it is extensive It is multiple, demonstrate the reasonability and completeness of redundancy management strategy.
Under redundancy architecture, the same number of situation of remaining, remaining rank is different, and system dependability also differs widely. Remaining rank can be component level-one, functional module level-one, and system level the most commonly used is the redundancy of functional module grade and is Irrespective of size redundancy.
Under the premise of obeying exponential distribution, it is assumed that the Reliability Function of i-th unit isWherein, λiIt is The failure rate of i-th of unit.Series System Reliability formula are as follows:
Parallel system reliability formula are as follows:
As shown in figure 8, according to the double redundancy of functional module grade, first and after go here and there;Module level dual redundant topological structure remaining Reliability formula is as follows:
As shown in figure 9, after first going here and there simultaneously, system-level dual redundant remaining reliability formula is such as according to system-level double redundancy Under:
According to reference and correlation experience, failure rate chooses λ0=10-5/ hour, λ2=5 × 10-6/ hour, λ3 =5 × 10-6/ hour.It as shown in Figure 10,, can using functional module grade redundancy under the conditions of remaining is the same number of in redundancy architecture It is greater than the reliability of system-level redundancy by spending.Therefore, ship carries computer redundancy management and uses module level dual redundant.

Claims (4)

1. a kind of near space ship carries computer multi-CPU system self-adapting reconstruction method, the multi-CPU system includes two identical Subsystem: primary subsystem and backup subsystem;The primary subsystem includes: central processing unit A0, RS422 interface list First A1 and 1553B expanding element A2;The backup subsystem include: central processing unit B0, RS422 interface unit B1 and 1553B expanding element B2;The described method includes:
Each unit of primary subsystem and backup subsystem that step 1) near space ship carries computer multi-CPU system is established logical Letter connection;The step 1) specifically includes:
Step 101) A0 is initialized as TCP server, and B1, B2 are initialized as TCP Client;A0 is by network interface in a manner of TCP/IP Connection is established with B1, B2, while being shaken hands by message mailbox and A1, A2 completion;
Step 102) B0 is initialized as TCP server, and A1, A2 are initialized as TCP Client;B0 is by network interface in a manner of TCP/IP Connection is established with A1, A2, while being shaken hands by message mailbox and B1, B2 completion;
Without sequencing between the step 101) and step 102);
Step 2) utilizes message mailbox and shared drive between CPU, inside primary subsystem, inside backup subsystem and active and standby Between subsystem in such a way that house dog and heartbeat combine, the cell failure of each subsystem is detected, establishes healthy matrix;
The high principle of the priority of each unit of the step 3) based on primary subsystem, based on the healthy matrix of step 2) to multi -CPU The active system of system is reconstructed.
2. near space ship according to claim 1 carries computer multi-CPU system self-adapting reconstruction method, feature exists In the step 2) includes:
For step 201) in the test initialization stage, A0 detects A1 and A2, and B0 detects B1 and B2;A0 and B0 realizes mutually detection;
Step 202) uses house dog and heartbeat phase inside primary subsystem, inside backup subsystem and between active and standby subsystem In conjunction with mode, detect the cell failure of each subsystem;
Step 203) establishes healthy matrix according to the testing result of step 202);
The healthy matrix of testing result formation multi-CPU system:
A in matrix X0、a1、a2Respectively indicate the health status of A0, A1, A2, value be 0 or 1,0 be it is normal, 1 is failure;Matrix Middle b0、b1、b2Respectively indicate B0, B1, B2 health status value be 0 or 1,0 be it is normal, 1 is failure.
3. near space ship according to claim 2 carries computer multi-CPU system self-adapting reconstruction method, feature exists In the step 202) specifically includes:
Step 202-1) when reaching detection cycle, A0 sends heartbeat signal and watchdog signal to B0, then detects whether to receive To the heartbeat signal of B0, if receiving heartbeat signal, heartbeat frequency of abnormity is reset, otherwise corresponding heartbeat frequency of abnormity+1;It examines later The watchdog signal for whether receiving B0 is surveyed, if receiving watchdog signal, house dog frequency of abnormity is reset, and otherwise corresponding house dog is different Normal number+1;When B0 heartbeat frequency of abnormity is unsatisfactory for > 2, it is believed that B0 is normal;When B0 heartbeat frequency of abnormity > 2, inspection is guarded the gate Dog frequency of abnormity determines that B0 breaks down if house dog frequency of abnormity > 2, sets B0 Reflector, otherwise determine A0 and B0 it Between heartbeat damage;
When reaching detection cycle, A0 detects whether to receive the heartbeat signal of A1, if receiving heartbeat signal, heartbeat frequency of abnormity is clear Zero, otherwise corresponding heartbeat frequency of abnormity+1;Detect whether that the watchdog signal for receiving A1 is seen if receiving watchdog signal later Door dog frequency of abnormity is reset, otherwise corresponding house dog frequency of abnormity+1;When A1 heartbeat frequency of abnormity is unsatisfactory for > 2, it is believed that A1 is just Often;When A1 heartbeat frequency of abnormity > 2, check house dog frequency of abnormity, if house dog frequency of abnormity > 2, determine A1 occur therefore Barrier, sets A1 Reflector, otherwise determines the heartbeat damage between A0 and A1;
When reaching detection cycle, A0 detects whether to receive the heartbeat signal of A2, if receiving heartbeat signal, heartbeat frequency of abnormity is clear Zero, otherwise corresponding heartbeat frequency of abnormity+1;Detect whether that the watchdog signal for receiving A2 is seen if receiving watchdog signal later Door dog frequency of abnormity is reset, otherwise corresponding house dog frequency of abnormity+1;When A2 heartbeat frequency of abnormity is unsatisfactory for > 2, it is believed that A2 is just Often;When A2 heartbeat frequency of abnormity > 2, check house dog frequency of abnormity, if house dog frequency of abnormity > 2, determine A2 occur therefore Barrier, sets A2 Reflector, otherwise determines the heartbeat damage between A0 and A2;
Step 202-2) when reaching detection cycle, B0 sends heartbeat signal and watchdog signal to A0, then detects whether to receive To the heartbeat signal of A0, if receiving heartbeat signal, beats are reset, otherwise corresponding heartbeat frequency of abnormity+1;Detecting later is The no watchdog signal for receiving A0, if receiving watchdog signal, house dog frequency of abnormity is reset, and otherwise corresponding house dog is extremely secondary Number+1;When A0 heartbeat frequency of abnormity is unsatisfactory for > 2, it is believed that A0 is normal;When A0 heartbeat frequency of abnormity > 2, check that house dog is different Normal number determines that A0 breaks down if house dog frequency of abnormity > 2, sets A0 Reflector, otherwise determines between B0 and A0 Heartbeat damage;
When reaching detection cycle, B0 detects whether to receive the heartbeat signal of B1, if receiving heartbeat signal, beats are reset, Otherwise corresponding heartbeat frequency of abnormity+1;The watchdog signal for detecting whether to receive B1 later, if receiving watchdog signal, house dog Frequency of abnormity is reset, otherwise corresponding house dog frequency of abnormity+1;When B1 heartbeat frequency of abnormity is unsatisfactory for > 2, it is believed that B1 is normal; When B1 heartbeat frequency of abnormity > 2, check that house dog frequency of abnormity determines that B1 breaks down if house dog frequency of abnormity > 2, B1 Reflector is set, otherwise determines the heartbeat damage between B0 and B1;
When reaching detection cycle, B0 detects whether to receive the heartbeat signal of B2, if receiving heartbeat signal, beats are reset, Otherwise corresponding heartbeat frequency of abnormity+1;The watchdog signal for detecting whether to receive B2 later, if receiving watchdog signal, house dog Frequency of abnormity is reset, otherwise corresponding house dog frequency of abnormity+1;When B2 heartbeat frequency of abnormity is unsatisfactory for > 2, it is believed that B2 is normal; When B2 heartbeat frequency of abnormity > 2, check that house dog frequency of abnormity determines that B2 breaks down if house dog frequency of abnormity > 2, B2 Reflector is set, otherwise determines the heartbeat damage between B0 and B2;
Above-mentioned steps 202-1) and step 202-2) between without sequencing.
4. near space ship according to claim 2 carries computer multi-CPU system self-adapting reconstruction method, feature exists In the step 3) specifically includes:
The high principle of the priority of each unit of the step 301) based on primary subsystem is obtained in conjunction with priority and health matrix X The matrix W on duty of current system:
W in matrix W on dutya0、wa1、wa2The respectively state on duty of A0, A1, A2, value are 0 or 1,0 to be on duty, and 1 is improper Class;wb0、wb1、wb2The respectively state on duty of B0, B1, B2, value are 0 or 1,0 to be on duty, and 1 is not on duty;
Step 302) selects the unit in state on duty to form primary subsystem from matrix on duty, realizes the reconstruct of system.
CN201610475340.9A 2016-06-24 2016-06-24 A kind of near space ship load computer multi-CPU system self-adapting reconstruction method Active CN106201981B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610475340.9A CN106201981B (en) 2016-06-24 2016-06-24 A kind of near space ship load computer multi-CPU system self-adapting reconstruction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610475340.9A CN106201981B (en) 2016-06-24 2016-06-24 A kind of near space ship load computer multi-CPU system self-adapting reconstruction method

Publications (2)

Publication Number Publication Date
CN106201981A CN106201981A (en) 2016-12-07
CN106201981B true CN106201981B (en) 2018-12-28

Family

ID=57461732

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610475340.9A Active CN106201981B (en) 2016-06-24 2016-06-24 A kind of near space ship load computer multi-CPU system self-adapting reconstruction method

Country Status (1)

Country Link
CN (1) CN106201981B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543251B (en) * 2018-11-05 2023-03-24 中国航空工业集团公司西安飞机设计研究所 Method for guaranteeing grade distribution in development of airborne equipment
CN110361961B (en) * 2019-07-10 2022-08-23 中国船舶工业综合技术经济研究院 Dynamic reconfiguration design method for networked weapon control system
CN113507312A (en) * 2021-05-06 2021-10-15 中国电子科技集团公司第十四研究所 On-board real-time processing architecture

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5594865A (en) * 1991-12-11 1997-01-14 Fujitsu Limited Watchdog timer that can detect processor runaway while processor is accessing storage unit using data comparing unit to reset timer
CN1764079A (en) * 2004-10-22 2006-04-26 北京佳讯飞鸿电气有限责任公司 Method and system for realizing low end access level router backup
CN1815908A (en) * 2006-03-02 2006-08-09 迈普(四川)通信技术有限公司 Telecommunication apparatus master-slave switching method and telecommunication apparatus thereof
CN1908906A (en) * 2005-08-05 2007-02-07 中兴通讯股份有限公司 Method for monitoring software operation state

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7069543B2 (en) * 2002-09-11 2006-06-27 Sun Microsystems, Inc Methods and systems for software watchdog support
US8735520B2 (en) * 2010-11-09 2014-05-27 E.I. Du Pont De Nemours And Company Vinyl fluoride polymerization and aqueous dispersion of vinyl fluoride polymer

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5594865A (en) * 1991-12-11 1997-01-14 Fujitsu Limited Watchdog timer that can detect processor runaway while processor is accessing storage unit using data comparing unit to reset timer
CN1764079A (en) * 2004-10-22 2006-04-26 北京佳讯飞鸿电气有限责任公司 Method and system for realizing low end access level router backup
CN1908906A (en) * 2005-08-05 2007-02-07 中兴通讯股份有限公司 Method for monitoring software operation state
CN1815908A (en) * 2006-03-02 2006-08-09 迈普(四川)通信技术有限公司 Telecommunication apparatus master-slave switching method and telecommunication apparatus thereof

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
《Duplex: A Reusable Fault Tolerance Extension Framework for Network Access Devices》;S. Sharma等;《2003 International Conference on Dependable Systems and Networks》;20031231;第501-510页 *
《Fault-Tolerant Distributed approach to satellite On-Board Computer design》;Muhammad Fayyaz等;《2014 IEEE Aerospace Conference》;20141231;第1-12页 *
《双冗余结构路由器故障恢复模型与方案研究》;吉萌等;《通信学报》;20060630;第27卷(第6期);第21-28页 *
《基于CPU主控平台的双机热备份机制探讨》;宁丽鹏;《计算机与网络》;20131130(第11期);第55-57页 *
《实时双机容错冗余系统设计与研究》;袁赣南等;《黑龙江自动化技术与应用》;19980228;第17卷(第2期);第60-65页 *

Also Published As

Publication number Publication date
CN106201981A (en) 2016-12-07

Similar Documents

Publication Publication Date Title
Yang et al. Fault-tolerant cooperative control of multiagent systems: A survey of trends and methodologies
US10338560B2 (en) Two-way architecture with redundant CCDL's
Bieber et al. New challenges for future avionic architectures.
CN106201981B (en) A kind of near space ship load computer multi-CPU system self-adapting reconstruction method
US20110251739A1 (en) Distributed fly-by-wire system
Kirillin et al. SSAU nanosatellite project for the navigation and control technologies demonstration
Lala et al. A design approach for ultrareliable real-time systems
CN101989945A (en) Communication network for aircraft
US8175759B2 (en) Systems and methods for validating predetermined events in reconfigurable control systems
CN103106126A (en) High-availability computer system based on virtualization
Johnson et al. Fault tolerant computer system for the A129 helicopter
Zhao et al. Reliability Analysis of the Reconfigurable Integrated Modular Avionics Using the Continuous‐Time Markov Chains
Levinson et al. Development and testing of a vehicle management system for autonomous spacecraft habitat operations
Fayyaz et al. Adaptive middleware design for satellite fault-tolerant distributed computing
Aminev et al. Comparative analysis of reliability prediction models for a distributed radio direction finding telecommunication system
CN107273575B (en) Satellite task autonomous design method and system for quick response requirements
Erlank et al. Satellite stem cells: The benefits & overheads of reliable, multicellular architectures
Om et al. Implementation of flight control computer redundancy system in unmanned aerial vehicle
US20230421246A1 (en) Communication satellite system, earth-side control facility, ground facility, artificial satellite, communication ground center, and transmission route search device
Fletcher Progression of an open architecture: from Orion to Altair and LSS
Duarte et al. Development of an autonomous redundant attitude determination system for Cubesats
Vladimirova et al. Wireless fault-tolerant distributed architecture for satellite platform computing
Hodson et al. Heavy lift vehicle (HLV) avionics flight computing architecture study
Wu et al. Design of applying FlexRay-bus to federated archiectecture for triple redundant reliable UAV flight control system
Bai et al. Design of a next generation high-speed data bus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant