CN113128802A - Risk control method and device for high-safety electronic system - Google Patents

Risk control method and device for high-safety electronic system Download PDF

Info

Publication number
CN113128802A
CN113128802A CN201911403887.8A CN201911403887A CN113128802A CN 113128802 A CN113128802 A CN 113128802A CN 201911403887 A CN201911403887 A CN 201911403887A CN 113128802 A CN113128802 A CN 113128802A
Authority
CN
China
Prior art keywords
fault
determining
failure
rate
meets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911403887.8A
Other languages
Chinese (zh)
Inventor
王群勇
严拴航
陈冬梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING SHENGTAOPING TEST ENGINEERING TECHNOLOGY RESEARCH INSTITUTE
Original Assignee
BEIJING SHENGTAOPING TEST ENGINEERING TECHNOLOGY RESEARCH INSTITUTE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING SHENGTAOPING TEST ENGINEERING TECHNOLOGY RESEARCH INSTITUTE filed Critical BEIJING SHENGTAOPING TEST ENGINEERING TECHNOLOGY RESEARCH INSTITUTE
Priority to CN201911403887.8A priority Critical patent/CN113128802A/en
Publication of CN113128802A publication Critical patent/CN113128802A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Administration (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention provides a risk control method and a risk control device of a high-safety electronic system, wherein the method comprises the following steps: acquiring all fault classes under each type of functional failure state according to the types of the functional failure states, and determining the predicted values of all fault rates and the recovery time of the faults so as to determine that the whole system meets the task requirements; determining that each fault has a corresponding depth defense strategy; determining that each type of fault meets a preset safety margin according to the predicted value of the fault rate and the index value of the fault rate; determining a key fault type according to the damage degree caused by each fault type and whether the damage influence is acceptable; and a monitoring system is arranged for determining each key fault type, and the occurrence of the fault and the recovery time of the fault can be monitored. The method ensures that the whole system meets the task requirement, each fault meets the preset safety margin, the risk is limited in a controlled state, all key faults can be timely monitored, and implementation of slowing measures is facilitated.

Description

Risk control method and device for high-safety electronic system
Technical Field
The invention relates to the field of risk control of electronic systems, in particular to a risk control method and device of a high-safety electronic system.
Background
Electronic systems are subject to environmental influences during the task, such as spatial radiation. Space radiation environment refers to all natural radiation ranging from the ground to extra-space. Natural radiation environments include the Galaxy cosmic rays, the solar cosmic rays, the earth's capture zone, atmospheric neutron radiation. From the ground to 36000 km high altitude, is the daily working environment of electronic systems such as satellites, airplanes, ground network computers and the like. The outer space vehicles such as satellites working for more than 100 kilometers are mainly influenced by the cosmic rays of the silver river, the solar cosmic rays and the earth capture zone; the main radiation environment of the near space aircraft within the range of 20 kilometers to 100 kilometers is mainly neutrons, protons, electrons and the like; the aircraft is mainly influenced by neutron radiation in the atmosphere within the range of 3 kilometers to 20 kilometers; the radiation environment below 3 km and on the ground is mainly neutrons.
Radiation particles in a spatial radiation environment can have severe radiation damage effects on materials and radiation sensitive devices. The radiation damage effect can be classified into a Single Event Effect (SEE), a total ionization dose effect (TID), a displacement damage effect (DD), and the like according to the type of the radiation damage effect, so that the electronic device is caused to malfunction. Cumulative effects such as total dose effects, displacement damage effects, etc. can cause hard failures of semiconductor integrated circuits in electronic systems, while transient effects such as single event effects can cause hard and/or soft failures of semiconductor integrated circuits in electronic systems. Therefore, the natural space environment hazard can affect the success of the task, and a set of risk impact evaluation method for evaluating the space environment hazard on the success of the task is needed to quantitatively and accurately guide the selection of key devices of the electronic system, the protection design of key functions and the reliability improvement of the key system, effectively avoid over-design and under-design and ensure the success of the task.
Disclosure of Invention
In order to solve the above problems, embodiments of the present invention provide a method and an apparatus for risk control of a high-security electronic system.
In a first aspect, an embodiment of the present invention provides a risk control method for a high-security electronic system, including: acquiring all fault classes under each type of functional failure state according to the types of the functional failure states, acquiring all fault classes of a system, and determining the predicted values of all fault rates and the recovery time of faults; determining that the whole system meets the task requirement according to the predicted values of all fault rates and the recovery time of the faults; determining that each fault has a corresponding depth defense strategy; determining that each type of fault meets a preset safety margin according to the predicted value of the fault rate and the index value of the fault rate; determining a key fault type according to the damage degree caused by each fault type and whether the damage influence is acceptable; and a monitoring system is arranged for determining each key fault type, and the occurrence of the fault and the recovery time of the fault can be monitored.
Further, the determining that the whole system meets the task requirement according to the predicted values of all the failure rates and the recovery time of the failure includes: according to an availability formula, if the availability of the system meets a preset threshold, the whole system meets the task requirement; the availability formulas include:
Figure BDA0002348111200000021
accordingly, the method further comprises: according to an availability formula, adjusting the mitigating measures of each fault so that the system availability meets a preset threshold; where A is system availability, i is fault class, λiFailure rate, T, for class i failuresiThe recovery time for a class i failure.
Further, the categories of the functional failure states include: planned outage short term hard failures, planned outage long term hard failures, unplanned outage short term hard failures, unplanned outage long term hard failures, unplanned outage short term soft failures, and unplanned outage long term soft failures.
Further, the method further comprises: determining a fault object needing important monitoring, and acquiring a true value of a fault rate; and determining whether the fault rate index value meets the task requirement or not according to the fault rate true value and the fault rate index value, and if not, adjusting the index value.
Further, the fault objects which are monitored in an important mode comprise faults which do not meet preset safety margins.
Further, the determining the predicted values of all the failure rates includes obtaining the total failure rate of each component after responding to all the failure physical mechanisms of the space environment according to a BOM list of the system, and determining the predicted value of each failure rate.
Further, the obtaining of the total failure rate of each device after responding to all the failed physical mechanisms includes: acquiring a single event effect fault rate, a total dose effect fault rate and a displacement damage fault rate caused by space radiation response to obtain a space radiation fault rate; acquiring the response of the components to all non-space radiation failure physical mechanisms to cause the non-space radiation failure rate; and the total sum of the space radiation fault rate and the non-space radiation fault rate is combined with the device manufacturing parameters and the process parameters to obtain the total fault rate of the components in the space environment.
In a second aspect, an embodiment of the present invention provides a risk control device for a high-security electronic system, including: the fault determining module is used for acquiring all faults of each type of functional failure state according to the types of the functional failure states, acquiring all faults of the system, and determining the predicted values of all fault rates and the recovery time of the faults; the task decision module is used for determining that the whole system meets the task requirements according to the predicted values of all fault rates and the recovery time of the faults; the defense-in-depth judgment module is used for determining that each fault has a corresponding slowing measure; the design margin judgment module is used for determining that each fault meets the preset safety margin according to the predicted value of the fault rate and the index value of the fault rate; the key fault determining module is used for determining the type of the key fault according to the damage degree caused by each fault class and whether the damage influence is acceptable; and the monitoring configuration module is used for determining that each key fault has a monitoring system and can monitor the occurrence of the fault and the recovery time of the fault.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the risk control method for the high-security electronic system according to the first aspect of the present invention.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the risk control method of the high-security electronic system according to the first aspect of the present invention.
According to the risk control method and device of the high-safety electronic system, all faults of each type of functional failure state are obtained according to the type of the functional failure state, all faults of the system are obtained, the predicted value of all fault rates and the recovery time of the faults are determined, and all faults affecting task risks and corresponding fault rates are favorably obtained. And determining that the whole system meets the task requirement according to the predicted values of all fault rates and the recovery time of the faults, and determining that each fault meets a preset safety margin according to the predicted values of the fault rates and the index values of the fault rates so that the whole system meets the task requirement, each fault meets the preset safety margin, and the risk is limited in a controlled state. And each key fault is determined to have a monitoring system, so that all faults can be monitored in time, and implementation of a slowing measure is facilitated. The method effectively realizes risk control of the system.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flowchart illustrating a method for risk control of a high security electronic system according to an embodiment of the present invention;
FIG. 2 is a block diagram of a risk control device of a high security electronic system according to an embodiment of the present invention;
fig. 3 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The traditional evaluation method of space radiation environment reliability at home and abroad mainly has three defects:
(1) the method is lack of basic theory, and the quantitative influence relationship of the current space radiation effect on the success or failure of the mission of the aerospace vehicle cannot be scientifically explained.
The existing spacecraft mainly adopts a device-level radiation evaluation method to independently perform test evaluation of a Single Event Effect (SEE), a total dose effect (TID) and a displacement damage effect (DD). The comprehensive influence of the three types of radiation effects on the device is not considered, and the comprehensive influence of the three types of radiation effects on an electronic system of the spacecraft is not considered. At present, a Radiation Design Margin (RDM) method is mainly adopted, and a close quantitative relation is not established between the RDM method and the task indexes of the spacecraft, so that the comprehensive capacity of the three types of radiation effects on meeting the requirements of the spacecraft on completing specific tasks cannot be accurately and quantitatively evaluated. Due to the lack of basic theory, the quantitative influence relationship of the current space radiation effect on the success or failure of the aerospace vehicle task cannot be scientifically explained.
(2) And due to the lack of an evaluation method, the quantitative influence of the space radiation effect on the success or failure of the mission of the aerospace vehicle cannot be calculated.
The existing aerospace electronic load reliability prediction engineering method does not include space radiation environmental stress such as GJB299 and MIL-HDBK-217, so that certain task risk exists. MIL-STD-217F electronic device reliability prediction is noted in the text: all models in this manual cannot be used to predict the effect of nuclear survivability or ionizing radiation. The recommended value in the micro-circuit failure rate prediction model is 0.5, and the space environment is considered to be good and not consistent with the actual space radiation environment. The selection of devices meeting the requirements of specific task indexes is guided without a reliability theoretical basis and corresponding tools, so that the phenomenon that the satellite space radiation reliability is over-designed locally and under-designed locally exists simultaneously can be caused, and the task risk exists. Therefore, due to the lack of an evaluation method, the quantitative impact of the spatial radiation effect on the success or failure of the aerospace vehicle mission cannot be calculated.
(3) Lack of effective data and incapability of accurately calculating quantitative influence of space radiation effect on success or failure of mission of aerospace vehicle
The existing spacecraft mainly adopts a device-level radiation evaluation method to independently perform test evaluation of a Single Event Effect (SEE), a total dose effect (TID) and a displacement damage effect (DD). The index requirement of the single event effect of the key device is that the LET threshold of the single event latch caused by heavy ions is more than 75MeV/cm2mg, the LET threshold value of single event upset caused by heavy ions is more than 37MeV/cm2mg, or turnover number less than 10-7The position of the day. The index requirement of the total dose effect of the key device is that the total dose resistance is usually more than 100rad, and the index requirement of the displacement damage effect of the key device is that the displacement damage resistance is usually more than the equivalent 10MeV proton 1010/cm2. The data of different dimensions can not be superposed with a comprehensive effect, is not the failure rate of the success or failure of the spacecraft task caused by the space radiation effect, and is not the effective data for judging the success or failure of the task. Therefore, due to the lack of valid data, the quantitative impact of the spatial radiation effect on the success or failure of the aerospace vehicle mission cannot be accurately calculated.
In a word, due to the lack of basic theory, evaluation method and effective data, the success or failure of the task of the current in-orbit spacecraft cannot be scientifically explained.
At present, a set of aerospace system space radiation reliability test evaluation method is urgently needed, the phenomenon that over-design and under-design exist simultaneously in a large range is avoided as much as possible, and the influence of the space radiation environment effect on the success or failure risk of an aerospace vehicle task is mastered, quantitatively, acquainted, known and controllable.
Fig. 1 is a flowchart of a risk control method for a high-security electronic system according to an embodiment of the present invention, and as shown in fig. 1, the embodiment of the present invention provides a risk control method for a high-security electronic system, including:
and S0, acquiring all fault classes under each type of functional failure state according to the types of the functional failure states, acquiring all faults of the system, and determining the predicted values of all fault rates and the recovery time of the faults.
Electronic systems in high-safety application fields such as nuclear power stations, satellites and airplanes can generate various FuncTional Failure states (FFC) under the action of 7 stresses (temperature, humidity, electricity, chemistry, vibration, temperature circulation and radiation). When the aggregate frequency λ of the plurality of functional failure conditions exceeds an industry recognized acceptable red line threshold (e.g., 10)-9H, etc.), a security risk arises.
The method is used for evaluating the electronic system before executing the task in the space environment, so that the risk of the electronic system for executing the task is reduced to the minimum, even to zero. The spatial environment may affect electronic components of an electronic system located therein to produce a spatial radiation effect (SRE effect) or a spatial non-radiation effect. For example, the radiation effect includes a charge-discharge effect, a single-particle effect, and the like, which affect the accuracy of the electronic device and cause an error, and the spatial environment of the electronic system corresponding to the task to be detected, including the temperature, the humidity, the concentration of each compound in the chemical environment, the radiation intensity, and other data, may be obtained first. Tasks are tasks performed by an electronic system, such as aviation tasks, aerospace tasks, nuclear power plant monitoring tasks.
To facilitate determining all faults, the category of the failed function state is determined or determined first in the embodiments of the present invention.
As a preferred example, the categories of fail states include: planned outage short term hard failures, planned outage long term hard failures, unplanned outage short term hard failures, unplanned outage long term hard failures, unplanned outage short term soft failures (with monitoring), and unplanned outage long term soft failures (without monitoring). I.e., category 6 fail-function states, each having multiple fault types.
A planned interruption is an interruption that can be predicted in advance, e.g., the power of an electronic system will run out, and is predicted in advance when the system is designed, and vice versa. Planned interrupts and unplanned interrupts are divided into long-term interrupts and short-term interrupts, respectively, depending on whether there is a backup or a repairable interrupt after the system has generated the interrupt. For example, a switch to a backup battery may be made, and a short-term scheduled interruption. For system soft errors, the exception types are classified into short-term soft failures with monitoring and long-term soft failures without monitoring, depending on whether the monitoring system (BIST) is set or not. The first four are hard faults and the last two are soft faults.
Under the space environment, various physical mechanisms of failure can cause the influence (including the inside production of components) to the electronic system to lead to the components and parts trouble, and the physical mechanism of failure includes: temperature, humidity, vibration, temperature cycling, chemical environment, electrical stress, radiation, and the like. And under each classification result, respectively acquiring a corresponding failure probability caused by the response to the failure physical mechanism. The recovery time of a fault is the time from the generation of the fault to the recovery of the fault, which is the countermeasure taken for the fault during design. Such as the time to switch to a backup battery after the charge is depleted. Aiming at FFC in a function failure state of 6 classes, the embodiment of the invention adopts an RIDM5 step analysis method to determine detailed and specific faults i and fault rates lambda in the 6 classesiAnd time to failure recovery TiThe combination of (1).
Under 6 types of functional failure conditions, all faults of each type of failure condition may be further acquired. For example, the classification is based on effect classification (neutron effect, proton effect), stress classification (the above 7 kinds of stress), task profile classification, and the like. Then, the risk control analysis of the system was performed in order according to the five analysis methods based on RIDM.
And S1, determining that the whole system meets the task requirement according to the predicted values of all fault rates and the recovery time of the faults.
All failures i, failure rate λiAnd time to failure recovery TiAfter the determination, the overall compliance of the system is determined based on the three.
As an alternative embodiment, the determination is made according to the following formula:
Figure BDA0002348111200000071
where A is system availability, i is fault class, λiFailure rate, T, for class i failuresiThe recovery time for a class i failure.
That is, at all λiAfter all are determined, according to the corresponding TiThe usability of the system as a whole can be calculated. For example, the preset threshold is 0.957, and a calculated according to the formula needs to satisfy the threshold to determine that the whole system satisfies the task requirement.
And S2, determining that each fault has a corresponding depth defense strategy.
The recovery time of the fault is obtained according to protection slowing measures during system design, the deep defense strategies comprise various protection slowing measures, and after the fact that the task requirements are integrally met is determined, whether each fault has a corresponding deep defense strategy is further judged. If not, the settings are added.
And S3, determining that each type of fault meets the preset safety margin according to the predicted value of the fault rate and the index value of the fault rate.
For each probability of failure occurrence, there is a corresponding λi-speci-prei-realThe index value, the expected value and the actual value are expressed respectively. According to the embodiment of the invention, whether each fault meets the preset safety margin or not is determined according to the predicted value and the index value. For example, the expected failure probability value for a certain type of failure is 10-5And the index value of the task requirement is 10-4If the ratio of the predicted value to the index value is 10, the safety margin of the system to the fault is 10 times, and if the ratio is greater than the preset value, the preset safety margin is met. All faults need to meet a safety margin.
And S4, determining the type of the key fault according to the damage degree caused by each fault class and whether the damage influence is acceptable.
First, a critical fault type generating a functional failure state may be determined according to a hazard consequence or a hazard degree caused by a fault, and the method for acquiring a joint fault according to the hazard degree caused by each fault type in the embodiments of the present invention is not particularly limited, and includes but is not limited to determining according to any one or a combination of more than one of an index required value of the fault, a fault rate predicted value, a recovery time of a mitigating measure, and a safety margin of the fault. The threshold value may be set to be selected according to the above-mentioned single index parameter or a combination of index parameters.
According to the degree of damage caused by each fault class and whether the damage influence is acceptable, the method for determining the key fault type comprises the step of determining the degree of damage according to the index requirement value, and accordingly determining the key fault type. For fault classes with large damage degree, the required value of the index is often set to be smaller, such as 10-9The fault rate index required value can be selected to be less than 10-6As a critical fault type.
The method for determining the type of the critical fault further comprises the following steps according to the degree of damage caused by each fault class and whether the damage influence is acceptable: determined by a predicted failure rate, e.g. selecting a failure rate greater than 10-6As a critical fault type. Or according to the safety margin, for example, the fault type with the safety margin less than 2 times is a critical fault. Or obtaining the key fault according to an availability formula, wherein the product of the failure rate predicted value and the failure recovery time is larger than a preset threshold value, or comprehensively determining the type of the key fault according to the system availability.
Second, the critical fault type may be determined based on whether the detrimental effects are acceptable. Determining whether the detrimental impact is acceptable includes determining based on a safety margin, where if the safety margin is greater than a predetermined threshold, the detrimental impact from the fault class is acceptable, and otherwise is not acceptable. Finally, joint failure types can be determined comprehensively based on the extent of the hazard and whether the hazard impact is acceptable. For example, the fault is 10 according to the degree of damage and the index requirement value-9(class A) 10-7(class B) and 10-5(class C), the degree of corresponding damage is from large to small. A type A fault can be selected as a key fault, and whether the damage influence is acceptable or not can be considered on the basis. For example, the index expected value of a certain class A fault is 10-10If the safety margin is 10 times, the harmful effect is considered to be acceptable and is not taken as a critical fault; of a certain class B faultThe index expected value is 10-7With only 1 time the safety margin, the detrimental effect is considered unacceptable and is considered a critical fault.
And S5, determining each key fault type, wherein a monitoring system is arranged, and the occurrence of the fault and the recovery time of the fault can be monitored.
I.e. to ensure that a critical fault i can be monitored, data of the recovery time Ti can be obtained. And detecting whether the critical fault i is provided with a monitoring system (BIST), and if not, setting a corresponding monitoring system. And the monitoring system is used for monitoring the occurrence of the fault i, recovering the fault through a preset slowing measure after the occurrence of the fault is monitored, and obtaining the recovery time Ti. The Ti design value may be evaluated and adjusted based on the actual recovery time for the failure.
According to the risk control method of the high-safety electronic system, all faults of each type of functional failure state are obtained according to the type of the functional failure state, all faults of the system are obtained, the expected values of all fault rates and the recovery time of the faults are determined, and all faults affecting task risks and corresponding fault rates are favorably obtained. And determining that the whole system meets the task requirement according to the predicted values of all fault rates and the recovery time of the faults, and determining that each fault meets a preset safety margin according to the predicted values of the fault rates and the index values of the fault rates so that the whole system meets the task requirement, each fault meets the preset safety margin, and the risk is limited in a controlled state. And each key fault is determined to have a monitoring system, so that all key faults can be monitored in time, and implementation of slowing measures is facilitated. The method effectively realizes risk control of the system.
Based on the content of the foregoing embodiment, as an optional embodiment, determining that the system as a whole meets the task requirement according to the predicted values of all failure rates and the recovery time of the failure includes: the step of determining that the whole system meets the task requirements according to the predicted values of all the failure rates and the recovery time of the failures comprises the following steps: according to an availability formula, if the availability of the system meets a preset threshold, the whole system meets the task requirement; the availability formulas include:
Figure BDA0002348111200000101
accordingly, the method further comprises: according to an availability formula, adjusting the mitigating measures of each fault so that the system availability meets a preset threshold; where A is system availability, i is fault class, λiFailure rate, T, for class i failuresiThe recovery time for a class i failure.
If not, the corresponding protection slowing-down measures can be adjusted, and the recovery time T of the measures is adjustediAnd until A meeting preset conditions can be obtained, thereby realizing risk control of the system. That is, at all λiAfter all are determined, T is adjusted by adjusting the mitigation strategyiAnd A meeting the preset conditions can be obtained, so that the overall risk control of the system is realized.
The availability threshold may be set experimentally or analytically for different task environments, depending on the characteristics of the electronic system, or requirements. By adjusting the response time of each fault, the availability is larger than a preset threshold value, so that the risk is reduced to an acceptable range, missing items and differences of change risk control can be found, key items influencing task index requirements can be mastered, and depth prevention and control measures can be supplemented. In principle, multiple iterations can realize zero risk, or find bottlenecks that affect zero risk realization, and make the risk state clear.
Based on the content of the foregoing embodiment, as an optional embodiment, the preset classification manner includes: planned outage short term hard failures, planned outage long term hard failures, unplanned outage short term hard failures, unplanned outage long term hard failures, unplanned outage short term soft failures (with monitoring), and unplanned outage long term soft failures (without monitoring). The details have been mentioned in the above embodiments, and are not described herein again. The method for dividing the abnormal types can analyze each type of abnormal types in a targeted manner, and is beneficial to the classification of fault types.
As an alternative embodiment, the planned interruption short term hard failures may not be taken as a fault i and corresponding fault rate in the determination of whether the mission requirements are met as a whole.
Based on the content of the foregoing embodiment, as an optional embodiment, the method further includes: determining a fault object needing important monitoring, and acquiring a true value of a fault rate; and determining whether the fault rate index value meets the task requirement or not according to the fault rate true value and the fault rate index value, and if not, adjusting the index value.
For some fault types, whether the index value is set reasonably needs to be further judged, that is, the fault object needs to be monitored in a critical way, the fault object monitored in the critical way can be all faults, or can be a key fault type, and the influence degree caused by the faults can be determined.
The actual value of the failure rate can be determined according to the test result of the system or the components forming the system in the space environment. And for the index value and the real value, verifying whether the requirement of the verification index meets the requirement of the task or not, namely verifying whether the verification index is reasonable or not through the proportion of the real value and the index value and the comparison result of the preset threshold. If the index value is not reasonable, the index value is correspondingly adjusted, and the adjusted index value is used in the RIDM 5-step analysis method. Through multiple iterations, zero risk can be realized, or bottlenecks affecting zero risk realization are found, and risk states are clarified, so that the aim of zero risk is realized in principle.
Based on the content of the foregoing embodiments, as an optional embodiment, the monitored fault objects are emphasized, including faults that do not meet the preset safety margin.
The situation that the fault does not meet the preset safety margin due to the setting of the index value can be considered in real application. For faults that do not meet the preset safety margin, this may be caused by unreasonable setting of the index value. Taking the faults as fault objects needing important monitoring, and acquiring the true value of the fault rate; and determining whether the fault rate index value meets the task requirement or not according to the fault rate true value and the fault rate index value, and if not, adjusting the index value.
Based on the content of the foregoing embodiment, as an alternative embodiment, determining the expected value of all failure rates includes: according to a BOM (bill of material) list of the system, the total failure rate of each device after responding to each failure physical mechanism of the space environment is obtained, and the predicted value of each failure rate is determined.
Specifically, each electronic system is configured with a plurality of devices, to a circuit board of each device, to a device of each circuit board, classified into 4 stages. And all components of the system can be obtained according to the BOM list of the system. And on the basis of all the components, respectively acquiring the total failure rate of each component after responding to each failure physical mechanism of the space environment. As mentioned above, the temperature, humidity, vibration, temperature cycle, chemical environment, electrical stress, radiation, etc. are determined according to the result of the response of each failure physical mechanism and the corresponding parameter values of the space environment, respectively, and the corresponding failure rate of the components can be obtained through experiments, for example. On the basis, the failure rate of the failure i can be determined according to the failure rate of each component.
According to the BOM list of the system, all electronic components of the electronic system can be considered from four levels, all failure physical mechanisms generating response are considered, and all fault risks can be covered.
Based on the content of the foregoing embodiments, as an alternative embodiment, obtaining the total failure rate of each device after responding to all the failed physical mechanisms includes: acquiring a single event effect fault rate, a total dose effect fault rate and a displacement damage fault rate caused by space radiation response to obtain a space radiation fault rate; acquiring the response of the components to all non-space radiation failure physical mechanisms to cause the non-space radiation failure rate; and the total sum of the space radiation fault rate and the non-space radiation fault rate is combined with the device manufacturing parameters and the process parameters to obtain the total fault rate of the components in the space environment.
Considering the response of each component to each physical failure mechanism, embodiments of the present invention may determine the failure rate of each component according to the following method:
first, consider the case of radiation and the case of non-radiation, respectively:
λspace=λSEETIDDD
wherein λ isspaceFor the failure rate due to spatial radiation, λTIDFailure rate due to total dose effect, λDDFor failure rate due to displacement damage, λSEEFor the failure rate caused by the single event effect, the specific method for acquiring the failure rate of each type is the prior art.
λphysical=λNOspacespace
Wherein λ isNOspaceFor failure rates due to non-spatial radiation, obtainable by the prior art, λphysicalIs the total failure rate due to the physical mechanism of failure.
Specifically, see the literature "a method for predicting Reliability of Space RadiaTion Environment" (A method of Space radiaTion Environment Reliability predicition).
The total failure rate of each device in response to each failure physical mechanism of the space environment can be obtained according to the following formula:
λ=λphysical×ΠPM×ΠProcess
wherein, λ is the failure rate of a single component in the space environment, piPMFor device manufacturing parameters, parameters representing quality and technical influence of device manufacturing, ΠProcessThe parameters are process parameters and represent quality and technical control parameters of the product development, manufacturing and using processes of the device.
For all components, the availability of the system is achieved according to the method described above.
Based on the embodiments, through 2 verification results of the index value, the predicted value and the true value, the adjustment of the usability formula on the slowing measures or the design scheme can find the missing items and the gaps of the change risk control. And (4) controlling key items influencing the task index requirements, and supplementing depth prevention and control measures. In principle, multiple iterations can realize zero risk, or find bottlenecks that affect zero risk realization, and make the risk state clear. Thus achieving the goal of zero risk in principle.
Fig. 2 is a block diagram of a risk control device of a high-security electronic system according to an embodiment of the present invention, as shown in fig. 2, the risk control device of the high-security electronic system includes: a fault determination module 200, a task decision module 201, a defense-in-depth determination module 202, a design margin determination module 203, a critical fault determination module 204, and a monitoring configuration module 205. The failure determining module 200 is configured to obtain all failures of each type of the functional failure state according to the type of the functional failure state, obtain all failures of the system, and determine expected values of all failure rates and recovery time of the failures; the task decision module 201 is configured to determine that the whole system meets task requirements according to the predicted values of all failure rates and the recovery time of the failures; the defense-in-depth judgment module 202 is configured to determine that each fault has a corresponding mitigation measure; the design margin judgment module 203 is used for determining that each fault meets the preset safety margin according to the predicted value of the fault rate and the index value of the fault rate; the critical fault determining module 204 is configured to determine a type of a critical fault according to a degree of damage caused by each fault class and whether a damage influence is acceptable; the monitoring configuration module 205 is configured to determine that there is a monitoring system for each failure, and is capable of monitoring the occurrence of the failure and the recovery time of the failure.
The device embodiment provided in the embodiments of the present invention is for implementing the above method embodiments, and for details of the process and the details, reference is made to the above method embodiments, which are not described herein again.
According to the risk control device of the high-safety electronic system, all faults of each type of functional failure state are obtained according to the type of the functional failure state, all faults of the system are obtained, the expected values of all fault rates and the recovery time of the faults are determined, and all faults affecting task risks and corresponding fault rates are favorably obtained. And determining that the whole system meets the task requirement according to the predicted values of all fault rates and the recovery time of the faults, and determining that each fault meets a preset safety margin according to the predicted values of the fault rates and the index values of the fault rates so that the whole system meets the task requirement, each fault meets the preset safety margin, and the risk is limited in a controlled state. And each key fault is determined to have a monitoring system, so that all key faults can be monitored in time, and implementation of slowing measures is facilitated. The method effectively realizes risk control of the system.
Fig. 3 is a schematic entity structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 3, the electronic device may include: a processor (processor)301, a communication Interface (CommunicaTions interfaces) 302, a memory (memory)303 and a bus 304, wherein the processor 301, the communication Interface 302 and the memory 303 complete communication with each other through the bus 304. The communication interface 302 may be used for information transfer of an electronic device. Processor 301 may call logic instructions in memory 303 to perform a method comprising: acquiring all fault classes under each type of functional failure state according to the types of the functional failure states, acquiring all fault classes of a system, and determining the predicted values of all fault rates and the recovery time of faults; determining that the whole system meets the task requirement according to the predicted values of all fault rates and the recovery time of the faults; determining that each fault has a corresponding depth defense strategy; determining that each type of fault meets a preset safety margin according to the predicted value of the fault rate and the index value of the fault rate; determining a key fault type according to the damage degree caused by each fault type and whether the damage influence is acceptable; and a monitoring system is arranged for determining each key fault type, and the occurrence of the fault and the recovery time of the fault can be monitored.
In addition, the logic instructions in the memory 303 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above-described method embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the transmission method provided in the foregoing embodiments when executed by a processor, and for example, the method includes: acquiring all fault classes under each type of functional failure state according to the types of the functional failure states, acquiring all fault classes of a system, and determining the predicted values of all fault rates and the recovery time of faults; determining that the whole system meets the task requirement according to the predicted values of all fault rates and the recovery time of the faults; determining that each fault has a corresponding depth defense strategy; determining that each type of fault meets a preset safety margin according to the predicted value of the fault rate and the index value of the fault rate; determining a key fault type according to the damage degree caused by each fault type and whether the damage influence is acceptable; and a monitoring system is arranged for determining each key fault type, and the occurrence of the fault and the recovery time of the fault can be monitored.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for risk control of a high security electronic system, comprising:
acquiring all fault classes under each type of functional failure state according to the types of the functional failure states, acquiring all fault classes of a system, and determining the predicted values of all fault rates and the recovery time of faults;
determining that the whole system meets the task requirement according to the predicted values of all fault rates and the recovery time of the faults;
determining that each fault has a corresponding depth defense strategy;
determining that each type of fault meets a preset safety margin according to the predicted value of the fault rate and the index value of the fault rate;
determining a key fault type according to the damage degree caused by each fault type and whether the damage influence is acceptable;
and a monitoring system is arranged for determining each key fault type, and the occurrence of the fault and the recovery time of the fault can be monitored.
2. The method for risk control of a high-safety electronic system according to claim 1, wherein the determining that the system as a whole meets the task requirements according to the predicted values of all failure rates and the recovery time of the failures comprises:
according to an availability formula, if the availability of the system meets a preset threshold, the whole system meets the task requirement;
the availability formulas include:
Figure FDA0002348111190000011
accordingly, the method further comprises:
according to an availability formula, adjusting the mitigating measures of each fault so that the system availability meets a preset threshold;
where A is system availability, i is fault class, λiFailure rate, T, for class i failuresiThe recovery time for a class i failure.
3. The risk control method of a high-security electronic system according to claim 1, wherein the categories of the functional failure states include:
planned outage short term hard failures, planned outage long term hard failures, unplanned outage short term hard failures, unplanned outage long term hard failures, unplanned outage short term soft failures, and unplanned outage long term soft failures.
4. The method for risk control of a high security electronic system of claim 1, further comprising:
determining a fault object needing important monitoring, and acquiring a true value of a fault rate;
and determining whether the fault rate index value meets the task requirement or not according to the fault rate true value and the fault rate index value, and if not, adjusting the index value.
5. The risk control method of a high-safety electronic system according to claim 4, wherein the failure object of the focus monitoring includes a failure not meeting a preset safety margin.
6. The risk control method of a high-security electronic system of claim 1, wherein the determining the expected value of all failure rates comprises:
and according to a BOM list of the system, acquiring the total failure rate of each component after responding to all failure physical mechanisms of the space environment, and determining the predicted value of each failure rate.
7. The risk control method for high security electronic system according to claim 1, wherein the obtaining the total failure rate of each device after responding to all failure physical mechanisms comprises:
acquiring a single event effect fault rate, a total dose effect fault rate and a displacement damage fault rate caused by space radiation response to obtain a space radiation fault rate;
acquiring the response of the components to all non-space radiation failure physical mechanisms to cause the non-space radiation failure rate;
and the total sum of the space radiation fault rate and the non-space radiation fault rate is combined with the device manufacturing parameters and the process parameters to obtain the total fault rate of the components in the space environment.
8. A risk control device for a high security electronic system, comprising:
the fault determining module is used for acquiring all faults of each type of functional failure state according to the types of the functional failure states, acquiring all faults of the system, and determining the predicted values of all fault rates and the recovery time of the faults;
the task decision module is used for determining that the whole system meets the task requirements according to the predicted values of all fault rates and the recovery time of the faults;
the defense-in-depth judgment module is used for determining that each fault has a corresponding slowing measure;
the design margin judgment module is used for determining that each fault meets the preset safety margin according to the predicted value of the fault rate and the index value of the fault rate;
the key fault determining module is used for determining the type of the key fault according to the damage degree caused by each fault class and whether the damage influence is acceptable;
and the monitoring configuration module is used for determining that each key fault type has a monitoring system and can monitor the occurrence of the fault and the recovery time of the fault.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, implements the steps of the risk control method of the high-security electronic system according to any one of claims 1 to 7.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the risk control method of a high-security electronic system according to any one of claims 1 to 7.
CN201911403887.8A 2019-12-30 2019-12-30 Risk control method and device for high-safety electronic system Pending CN113128802A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911403887.8A CN113128802A (en) 2019-12-30 2019-12-30 Risk control method and device for high-safety electronic system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911403887.8A CN113128802A (en) 2019-12-30 2019-12-30 Risk control method and device for high-safety electronic system

Publications (1)

Publication Number Publication Date
CN113128802A true CN113128802A (en) 2021-07-16

Family

ID=76768559

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911403887.8A Pending CN113128802A (en) 2019-12-30 2019-12-30 Risk control method and device for high-safety electronic system

Country Status (1)

Country Link
CN (1) CN113128802A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115097277A (en) * 2022-06-20 2022-09-23 南方电网科学研究院有限责任公司 Atmospheric neutron accelerated irradiation test method for flexible direct current converter valve power unit
CN118229270A (en) * 2024-05-22 2024-06-21 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) External field replaceable unit dividing method, external field replaceable unit dividing device, computer equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2438163C1 (en) * 2010-05-18 2011-12-27 Войсковая Часть 32103 Method of determining periodicity of inspecting random access memory during operation in radiation conditions of cosmic space on sun-synchronous orbit
CN104143036A (en) * 2013-05-10 2014-11-12 北京圣涛平试验工程技术研究院有限责任公司 Failure rate based quantitative control method for space radiation environment reliability
CN105718713A (en) * 2015-08-31 2016-06-29 北京圣涛平试验工程技术研究院有限责任公司 Reliability analysis method of space radiation environment
CN106875105A (en) * 2017-01-23 2017-06-20 东北大学 A kind of power distribution network differentiation planing method for considering combined failure risk
CN108280597A (en) * 2018-03-02 2018-07-13 北京空间技术研制试验中心 Relative risk appraisal procedure based on assembly spacecraft
CN108408083A (en) * 2018-03-02 2018-08-17 北京空间技术研制试验中心 Manned spacecraft risk prevention system method in orbit
FR3075435A1 (en) * 2017-12-20 2019-06-21 Dassault Aviation METHOD FOR IMPLEMENTING ACTIONS IN THE DESIGN, MANUFACTURE AND OPERATION OF AN AIRCRAFT FLEET AND ASSOCIATED SYSTEM

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2438163C1 (en) * 2010-05-18 2011-12-27 Войсковая Часть 32103 Method of determining periodicity of inspecting random access memory during operation in radiation conditions of cosmic space on sun-synchronous orbit
CN104143036A (en) * 2013-05-10 2014-11-12 北京圣涛平试验工程技术研究院有限责任公司 Failure rate based quantitative control method for space radiation environment reliability
CN105718713A (en) * 2015-08-31 2016-06-29 北京圣涛平试验工程技术研究院有限责任公司 Reliability analysis method of space radiation environment
CN106875105A (en) * 2017-01-23 2017-06-20 东北大学 A kind of power distribution network differentiation planing method for considering combined failure risk
FR3075435A1 (en) * 2017-12-20 2019-06-21 Dassault Aviation METHOD FOR IMPLEMENTING ACTIONS IN THE DESIGN, MANUFACTURE AND OPERATION OF AN AIRCRAFT FLEET AND ASSOCIATED SYSTEM
CN108280597A (en) * 2018-03-02 2018-07-13 北京空间技术研制试验中心 Relative risk appraisal procedure based on assembly spacecraft
CN108408083A (en) * 2018-03-02 2018-08-17 北京空间技术研制试验中心 Manned spacecraft risk prevention system method in orbit

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115097277A (en) * 2022-06-20 2022-09-23 南方电网科学研究院有限责任公司 Atmospheric neutron accelerated irradiation test method for flexible direct current converter valve power unit
CN115097277B (en) * 2022-06-20 2024-04-12 南方电网科学研究院有限责任公司 Atmospheric neutron acceleration irradiation test method for flexible direct current converter valve power unit
CN118229270A (en) * 2024-05-22 2024-06-21 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) External field replaceable unit dividing method, external field replaceable unit dividing device, computer equipment and storage medium
CN118229270B (en) * 2024-05-22 2024-09-17 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) External field replaceable unit dividing method, external field replaceable unit dividing device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
Wang et al. Impacts of operators’ behavior on reliability of power grids during cascading failures
Hokstad et al. Common cause failure modeling: status and trends
CN104240781B (en) Signal distribution method and signal distribution system of digital instrument control system (DCS) of nuclear power plant
CN113128802A (en) Risk control method and device for high-safety electronic system
CN105117576A (en) Spacecraft system-level single event upset effect analysis method based on fault propagation
CN105718713A (en) Reliability analysis method of space radiation environment
CN113946932A (en) Method and device for evaluating reliability of space radiation environment
Seong Reliability and risk issues in large scale safety-critical digital control systems
Mahmood et al. Formal reliability analysis of protective systems in smart grids
Xie et al. Common cause failures and cascading failures in technical systems: Similarities, differences and barriers
Zhang et al. Reliability modeling and analysis of reactor protect system based on Petri net
Niemi et al. Modeling offshore wind farm disturbances and maintenance service responses within the scope of resilience
Yang et al. Reliability analysis
Papazoglou et al. Markovian reliability analysis under uncertainty with an application on the shutdown system of the Clinch River Breeder Reactor
CN104142628A (en) Method for designing reliability index of space radiation environment
CN104820777B (en) Method for identifying single-particle protective weak spots of spacecraft system
MISHRA et al. Availability of k-out-of-n: F secondary subsystem with general repair time distribution
Hecht Use of SysML to generate failure modes and effects analyses for microgrid control systems
CN109145432B (en) Method for evaluating single event effect influence of ground-to-air 100 km aircraft
Shah Fault detection and diagnosis in nuclear power plant—A brief introduction
Yu et al. Cascading Failure Propagation in Cyber Physical Power Systems under Extreme Weather Events
CN112652415A (en) Post-processing plant emergency state grading determination method based on characteristic parameter analysis
Samadi et al. Statistical model checking based analysis of fault trees and power consumption to enhance autonomous systems reliability
Bunus et al. Model-based diagnostics techniques for avionics applications with RODON
CN117669872B (en) Method and device for analyzing equipment reliability of nuclear fuel post-treatment plant

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination