CN110989427A - Fault detection and health management method for multiprocessor computer - Google Patents

Fault detection and health management method for multiprocessor computer Download PDF

Info

Publication number
CN110989427A
CN110989427A CN201911133703.0A CN201911133703A CN110989427A CN 110989427 A CN110989427 A CN 110989427A CN 201911133703 A CN201911133703 A CN 201911133703A CN 110989427 A CN110989427 A CN 110989427A
Authority
CN
China
Prior art keywords
core
health management
fault detection
monitoring
power supply
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911133703.0A
Other languages
Chinese (zh)
Inventor
窦爱萍
封安
吴志川
隽鹏辉
原晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Aeronautics Computing Technique Research Institute of AVIC
Original Assignee
Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Aeronautics Computing Technique Research Institute of AVIC filed Critical Xian Aeronautics Computing Technique Research Institute of AVIC
Priority to CN201911133703.0A priority Critical patent/CN110989427A/en
Publication of CN110989427A publication Critical patent/CN110989427A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/04Programme control other than numerical control, i.e. in sequence controllers or logic controllers
    • G05B19/042Programme control other than numerical control, i.e. in sequence controllers or logic controllers using digital processors
    • G05B19/0423Input/output
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/20Pc systems
    • G05B2219/24Pc safety
    • G05B2219/24215Scada supervisory control and data acquisition

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)

Abstract

The invention provides a missile-borne inertial navigation system-oriented core computing platform fault detection and health management method based on a multi-core DSP (digital signal processor), which realizes fault detection and health management of a missile-borne navigation system-oriented core computing platform through the processes of data acquisition and transmission, data preprocessing, feature extraction, state monitoring and health management by methods of multi-core processor heartbeat monitoring, multi-core heartbeat monitoring, multi-power supply monitoring, atmospheric pressure and ambient temperature monitoring, high-speed 1553B bus monitoring and the like.

Description

Fault detection and health management method for multiprocessor computer
Technical Field
The invention belongs to the technical field of embedded systems, and particularly relates to a fault detection and health management method of a multiprocessor computer.
Background
The missile-borne inertial navigation system has the characteristics of poor detachability of an installation mode, periodic exposure of a service environment under extreme temperature and humidity conditions, less task times in a life cycle, long storage period and the like, so that a core computing platform in the missile-borne inertial navigation system is limited by the service conditions, and needs to have strong fault detection and health management capabilities, and meanwhile, a fault detection mechanism needs to have low false alarm rate, high fault detection rate and fault isolation capability.
The application range of the fault detection and health management technology of countries in Europe and America covers various advanced weaponry, and the quantity is large. In the aspect of the implementation effect of the PHM technology, the F-35 airplane is the most obvious, the failure irreproducibility rate of the airplane is reduced by 82 percent, the maintenance manpower is reduced by 20 to 40 percent, the logistics scale is reduced by 50 percent, the number of the operation stands is improved by 25 percent, the use and guarantee cost of the airplane is reduced by more than 50 percent compared with that of the past airplane, and the service life reaches 8000 flight hours. Statistical data fully proves the important functions of the PHM in reducing maintenance and guarantee cost, improving the safety, availability and integrity of weaponry, ensuring the success of tasks and improving the combat efficiency.
The patent of Beijing automation control equipment research institute, which is applied for 'an inertia/satellite deep combination information processing hardware platform based on multi-core DSP', solves the technical problems of the requirements of an inertia/satellite deep combination navigation information processing algorithm on data sharing, clock synchronization, operation instantaneity and operation capability. The invention only mentions the implementation mechanism of hardware resources, but does not consider the method of fault detection and health management in the hardware system.
The navigation resolving device based on the heterogeneous multi-core architecture, applied by the institute of optoelectronic technology in China, solves the problems of a high-speed data exchange mechanism and a message synchronization mechanism among all basic processing units in a multi-core system, has an extensible and tailorable hardware structure, can adapt to various navigation requirements and processing methods, and has good instantaneity, flexibility and reliability. But are not related to fault detection and health management related descriptions of multicore processor cores and peripheral circuits in navigation solvers.
Disclosure of Invention
The invention aims to provide a fault detection and health management method of a multiprocessor computer, which aims at fault detection and health management in a core computing platform based on a multi-core DSP in an airborne environment application, aims at a fault detection mechanism of functional components such as multi-core processing, temperature and air pressure acquisition, a high-speed serial bus and the like, and is used for meeting the requirements of autonomous fault detection and health management of an embedded environment core processing platform.
The invention is realized by adopting the following technical scheme:
a fault detection and health management method of a multiprocessor computer is based on a core computing platform which is composed of a data acquisition part, a data processing part and a state monitoring and fault diagnosis part, wherein the data acquisition part comprises a plurality of sensors, a processor monitoring module and a bus interface module, and the fault detection and health management method specifically comprises the following steps:
(1) the fault detection of the core circuit of the multi-core DSP is realized by designing a multi-core heartbeat circuit;
(2) monitoring and predicting faults of a secondary power supply network;
(3) collecting and monitoring a severe environment parameter sensor;
(4) and monitoring high-speed (4M)1553B bus communication.
Preferably, the multi-core DSP adopts a DSP-Q6713J/500 chip.
Preferably, in the step (1), the DSP-Q6713J/500 includes 4 independent EMIF buses, a heartbeat register is expanded by using FPGA resources through the EMIF buses, each independent core maintains the heartbeat register after the system is powered on and runs, and stores the maintenance result in the corresponding computing unit, the heartbeat of each core is detected by the main control unit, and when the heartbeats of other units are abnormal, the corresponding fault code is sent to the system control unit through the bus, thereby realizing fault monitoring of the multi-core processor functional circuit.
Preferably, the step (2) is used for implementing quantitative testing of the complex secondary power supply network, and judging the working state trend of the secondary power supply through data acquisition and data analysis, so as to estimate the health of the secondary power supply.
Preferably, the step (2) is specifically to perform a full-edge scan test within a voltage range, design a voltage conditioning and a cross switch in an output network, obtain an amplitude state of the secondary power supply through high-speed a/D sampling, observe a change condition of the secondary power supply output through a statistical quantity value, since the performance of the secondary power supply is generally gradual change until failure, and monitor the secondary power supply output voltage through long-time data sampling and analysis.
Preferably, in the step (3), the environment parameters where the system is located are acquired by adopting a high-precision sensor, and the reliability of the operating environment of the system is judged by acquiring, filtering and estimating through the environment sensor.
Preferably, in the step (4), the detection of the high-speed 1553B bus is performed by using an offline mode for communication detection.
Preferably, the specific method for performing communication detection in the offline mode is that, as an RT in the system, control and test instructions are sent by a BC of the system, in a power-on BIT of the high-speed 1553B bus, a corresponding vector word command test and a corresponding data test are performed on an A, B channel of the 1553B bus, after all channel switching tests of the A, B bus pass, the power-on BIT test of the system is calculated to pass, and it is ensured that both redundant channels are subjected to a coverage test in the power-on BIT.
The invention has the technical effects that: the invention solves the problem of in-board resource fault detection and health management of navigation core resources, realizes a fault prediction mechanism for embedded equipment used in severe environment, and improves the reliability of the system.
Drawings
FIG. 1 is a schematic diagram of core computing platform fault monitoring and health management data flow.
FIG. 2 is a diagram of a core computing platform interface and functionality.
Fig. 3 is a secondary power supply network for a core computing platform.
FIG. 4 is a schematic diagram of a circuit of a platinum resistance temperature measurement RTD.
Fig. 5 is a schematic view of the principle of air pressure acquisition.
FIG. 6 is a high speed 1553B bus interface circuit.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The specific steps in which the present invention is implemented are further described with reference to fig. 1-6. The fault detection and health management method of the multiprocessor computer is successfully implemented in a navigation system.
A fault detection and health management method of a multiprocessor computer is based on a core computing platform which is composed of a data acquisition part, a data processing part and a state monitoring and fault diagnosis part, wherein the architecture of the core computing platform is shown in figure 1, the data acquisition part comprises a plurality of sensors, a processor monitoring module and a bus interface module, and the fault detection and health management method specifically comprises the following steps:
(1) the fault detection of the core circuit of the multi-core DSP is realized by designing a multi-core heartbeat circuit;
(2) monitoring and predicting faults of a secondary power supply network;
(3) collecting and monitoring a severe environment parameter sensor;
(4) and monitoring high-speed (4M)1553B bus communication.
The multi-core DSP is a multi-core DSP-Q6713J/500 chip developed by the national defense science, and the SMQ2V1000 chip is selected as a programmable logic chip. The processor comprises 4 cores based on C6713, and the highest working dominant frequency can reach 500 MHz.
The four kernels in the system respectively realize the following work:
the first kernel realizes the management of a 1553B bus, a high-speed serial bus, a low-speed serial bus and the switching of internal and external frequency standards;
the second kernel realizes the control of the motor and the acquisition of the locking mechanism;
the third kernel realizes the self-calibration function of the user;
and the fourth kernel is used for health management and fault diagnosis decision and judgment.
In the step (1), the DSP-Q6713J/500 includes 4 independent EMIF buses, a heartbeat register is expanded by using FPGA resources through the EMIF buses, each independent core maintains the heartbeat register after the system is powered on and runs, and stores the maintenance result in the corresponding computing unit, the heartbeat of each core is detected by the main control unit, and when the heartbeats of other units are abnormal, the corresponding fault code is sent to the system control unit through the bus, thereby realizing fault monitoring of the multi-core processor functional circuit, as shown in fig. 2. Meanwhile, the register is interconnected with an external test interface in a discrete quantity mode, and 4 paths of DSP discrete output signals are all subjected to optical coupling isolation. The signal isolation design uses HCPL-6651 or HCPL-0631. The operating health of the multicore processor may also be obtained through external testing.
In the step (2), the core computing platform bears many functions such as various interfaces, buses, motor servo circuits and the like, as shown in fig. 2, the secondary power supply network is also complex, and a fault of any one of the secondary power supplies may affect normal operation of the system, so that monitoring of the secondary power supply network is a key point related to healthy operation of the system.
The power supply monitoring is to carry out quantitative test on the output of the secondary power supply, carry out full-edge scanning test in a voltage range, design a voltage conditioning and a cross switch in an output network, and obtain the amplitude state of the secondary power supply through high-speed A/D sampling. Through counting the amount value, observe the change condition of secondary power output, because the performance degradation of secondary power is gradual change nature until inefficacy generally, through long-time data sampling and analysis, monitor secondary power output voltage to the condition of failing to the power carries out the early warning.
The temperature and air pressure acquisition circuit in the step (3) not only provides compensation parameters of functional components for the system, but also monitors whether the sensor body works in an expected working environment, and when the environmental temperature and the air pressure are severe, the acquisition precision of the sensor can be greatly reduced, so that the working temperature environment and the working air pressure environment of the sensor can be monitored, and when the environment is severe, the temperature and the air pressure environment can be timely reported to the system.
The core computing platform adopts a platinum resistance sensor to carry out temperature acquisition, as shown in fig. 4, the temperature acquisition circuit is designed based on a special AD chip ADS1148 for high-precision temperature measurement. The ADS1148 of TI company is a highly integrated 16-bit precise ADC chip, and two identical constant current source IDACs are arranged inside the ADS1148, so that the introduction of measurement errors is avoided. The ADS1148 analog-to-digital conversion chip belongs to a special data collector for temperature measurement, one ADS1148 can be externally connected with 4 paths of analog signals input in a differential mode, 2 identical constant current source IDACs are provided, the ADS1148 and the DSP are communicated by adopting an SPI bus, and the ADS1148 is a highly integrated 16-bit precise ADC chip. The three-wire RTD (platinum resistor) connection method adopts a ratio structure to generate reference voltage, and improves the precision of the system. Meanwhile, aiming at the nonlinear characteristic of the temperature sensor, the nonlinear error of the temperature measurement is corrected and compensated by adopting a least square polynomial fitting and dichotomy table look-up method, so that the temperature measurement precision of the system is further improved.
The electrical characteristics of an analog signal interface of atmospheric pressure are temporarily set to be 0-5V, the consistence of chips, interfaces and logic is considered, an ADC chip for collecting the atmospheric pressure and voltage by a platinum resistance temperature measuring circuit is also adopted, after FX147 conditioning is carried out at the front end, the atmospheric pressure and voltage are input into ADS1148, and the circuit principle is shown in figure 5.
In step (4), the high-speed 1553B bus is a communication bus of the system, the high-speed 1553B bus interface circuit is as shown in fig. 6, and all commands of the system are issued through the 1553B bus. The reliability of bus operation is directly related to the reliability of the interaction of the core processing platform and the system.
The detection of a high-speed 1553B bus in a core processing platform adopts an off-line mode to carry out communication detection, namely, the off-line A/B channel is respectively sent and responded to the high-speed bus, and the physical link layer detection is carried out on the redundant A/B channel through the instruction sending of the fixed channel and the instruction returning of the fixed channel, so that the reliability of a bus link is ensured. As the RT in the system, both control and test commands are issued by the BC of the system. In the power-on BIT of the high-speed 1553B bus, corresponding vector word command test and data test are respectively carried out on an A, B channel of the 1553B bus, and after the A, B bus channel switching test is passed, the power-on BIT test of the system is calculated to be passed, so that the two redundant channels are subjected to the coverage test in the power-on BIT.

Claims (8)

1. A fault detection and health management method of a multiprocessor computer is based on a core computing platform which is composed of a data acquisition part, a data processing part and a state monitoring and fault diagnosis part, wherein the data acquisition part comprises a plurality of sensors, a processor monitoring module and a bus interface module, and is characterized in that the fault detection and health management method specifically comprises the following steps:
(1) the fault detection of the core circuit of the multi-core DSP is realized by designing a multi-core heartbeat circuit;
(2) monitoring and predicting faults of a secondary power supply network;
(3) collecting and monitoring a severe environment parameter sensor;
(4) and monitoring high-speed 1553B bus communication.
2. A method of fault detection and health management for a multiprocessor computer as claimed in claim 1, wherein: the multi-core DSP adopts a DSP-Q6713J/500 chip.
3. A fault detection and health management method of a multiprocessor computer as claimed in claim 1 or 2, characterized in that: in the step (1), the DSP-Q6713J/500 chip comprises 4 independent EMIF buses, a heartbeat register is expanded by utilizing FPGA resources through the EMIF buses, each independent core maintains the heartbeat register after the system is powered on and runs, maintenance results are stored in the corresponding computing unit, the heartbeat of each core is detected by the main control unit, and when the heartbeat of other units is abnormal, the corresponding fault code is sent to the system control unit through the buses, so that fault monitoring of the multi-core processor functional circuit is realized.
4. A method of fault detection and health management for a multiprocessor computer as claimed in claim 1, wherein: and the step (2) is used for realizing quantitative test of the complex secondary power supply network, and judging the working state trend of the secondary power supply through data acquisition and data analysis so as to estimate the health of the secondary power supply.
5. A fault detection and health management method of a multiprocessor computer as claimed in claim 1 or 4, characterized in that: specifically, the step (2) is to perform a full-edge scanning test within a voltage range, design a voltage conditioning and a cross switch in an output network, obtain the amplitude state of the secondary power supply through high-speed A/D sampling, observe the change condition of the secondary power supply output through a statistical value, and monitor the secondary power supply output voltage through long-time data sampling and analysis because the performance degradation of the secondary power supply is gradual change until failure.
6. A method of fault detection and health management for a multiprocessor computer as claimed in claim 1, wherein: and (3) acquiring the environmental parameters of the system by adopting a high-precision sensor, and acquiring, filtering and estimating by using an environmental sensor so as to judge the reliability of the operating working environment of the system.
7. A method of fault detection and health management for a multiprocessor computer as claimed in claim 1, wherein: in the step (4), the detection of the high-speed 1553B bus is performed by adopting an offline mode for communication detection.
8. A method of fault detection and health management for a multiprocessor computer as claimed in claim 7, wherein: the specific method for carrying out communication detection in the off-line mode comprises the steps that the instructions are used as RT in a system, control and test instructions are sent out by a BC of the system, corresponding vector word command tests and data tests are respectively carried out on A, B channels of a 1553B bus in a power-on BIT of a high-speed 1553B bus, after A, B bus channel switching tests are passed, the power-on BIT of the system is calculated to be passed, and it is guaranteed that two redundant channels are subjected to coverage tests in the power-on BIT.
CN201911133703.0A 2019-11-19 2019-11-19 Fault detection and health management method for multiprocessor computer Pending CN110989427A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911133703.0A CN110989427A (en) 2019-11-19 2019-11-19 Fault detection and health management method for multiprocessor computer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911133703.0A CN110989427A (en) 2019-11-19 2019-11-19 Fault detection and health management method for multiprocessor computer

Publications (1)

Publication Number Publication Date
CN110989427A true CN110989427A (en) 2020-04-10

Family

ID=70084867

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911133703.0A Pending CN110989427A (en) 2019-11-19 2019-11-19 Fault detection and health management method for multiprocessor computer

Country Status (1)

Country Link
CN (1) CN110989427A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112033436A (en) * 2020-08-07 2020-12-04 苏州天麓智能科技有限责任公司 Fault diagnosis method of laser gyro inertial navigation system based on BIT test technology
CN112214380A (en) * 2020-11-05 2021-01-12 中国航空工业集团公司西安航空计算技术研究所 Working life monitoring method for embedded computer
CN114020070A (en) * 2021-10-15 2022-02-08 北京航天控制仪器研究所 Temperature control system for compatible two-type inertial platform

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103701663A (en) * 2013-12-25 2014-04-02 北京航天测控技术有限公司 1553B bus program control fault injection device
US20170269984A1 (en) * 2016-03-18 2017-09-21 Qualcomm Incorporated Systems and methods for improved detection of processor hang and improved recovery from processor hang in a computing device
CN108318028A (en) * 2017-12-20 2018-07-24 中国航空工业集团公司西安航空计算技术研究所 A kind of navigation system core processing circuit design method
CN109921958A (en) * 2019-03-19 2019-06-21 北京润科通用技术有限公司 A kind of 1553B bus detection device, system and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103701663A (en) * 2013-12-25 2014-04-02 北京航天测控技术有限公司 1553B bus program control fault injection device
US20170269984A1 (en) * 2016-03-18 2017-09-21 Qualcomm Incorporated Systems and methods for improved detection of processor hang and improved recovery from processor hang in a computing device
CN108318028A (en) * 2017-12-20 2018-07-24 中国航空工业集团公司西安航空计算技术研究所 A kind of navigation system core processing circuit design method
CN109921958A (en) * 2019-03-19 2019-06-21 北京润科通用技术有限公司 A kind of 1553B bus detection device, system and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
任续津,等: "地空导弹在架测试技术研究", 《现代防御技术》, no. 2, 30 April 2018 (2018-04-30), pages 173 - 179 *
李晓颖,等: "某导弹武器系统1553B总线监测系统设计", 《弹箭与制导学报》, no. 1, 29 February 2016 (2016-02-29), pages 171 - 173 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112033436A (en) * 2020-08-07 2020-12-04 苏州天麓智能科技有限责任公司 Fault diagnosis method of laser gyro inertial navigation system based on BIT test technology
CN112214380A (en) * 2020-11-05 2021-01-12 中国航空工业集团公司西安航空计算技术研究所 Working life monitoring method for embedded computer
CN114020070A (en) * 2021-10-15 2022-02-08 北京航天控制仪器研究所 Temperature control system for compatible two-type inertial platform
CN114020070B (en) * 2021-10-15 2023-03-31 北京航天控制仪器研究所 Temperature control system for compatible two-type inertial platform

Similar Documents

Publication Publication Date Title
CN110989427A (en) Fault detection and health management method for multiprocessor computer
CN103728965B (en) Monitoring device and method for aircraft engine and FADEC system
US9489340B2 (en) Electrical power health monitoring system
CN109976141B (en) UAV sensor signal redundancy voting system
US20100100259A1 (en) Fault diagnosis device and method for optimizing maintenance measures in technical systems
WO2012057378A1 (en) Universal sensor self-diagnosis device and diagnosis method therefor
CN103544092A (en) Health monitoring system of avionic electronic equipment based on ARINC653 standard
CN111611114A (en) Integrated avionics PHM system
CN100507580C (en) Electronic type transformer high voltage side redundant backup circuit and failure detection method
CN111176548B (en) SiP-based integrated spaceborne computer system
US20230408600A1 (en) Battery sampling chip and battery management system
CN103163486A (en) Radar antenna power source failure detection circuit
Stankunas et al. Experimental research of wireless sensor network application in aviation
CN113589133A (en) Lean built-in self-checking circuit
Lyu et al. Prognostics and health management technology for radar system
CN106940544A (en) Airborne-bus communication control method based on DSP and CPLD
Zhang Aviation manufacturing equipment based WSN security monitoring system
CN111639070A (en) Redundant data screening, measuring and controlling method and device using same
CN102914360A (en) Monitoring device and monitoring method for vibration of redundancy type wind turbine generator
CN112345217A (en) Intelligent health monitoring system for residual fatigue life of key part of airplane
CN112505739A (en) Total radiometer abnormality detection method and device and total radiometer
CN108444376B (en) Super-large scale real-time distributed strain measurement system
CN108226662B (en) Airborne computer fault prediction method
Zhang et al. Design on Universal Flight Test Platform for Aerospace Components
Ahmed et al. Holistic IJTAG-based External and Internal Fault Monitoring in UAVs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination