CN113128802A

CN113128802A - Risk control method and device for high-safety electronic system

Info

Publication number: CN113128802A
Application number: CN201911403887.8A
Authority: CN
Inventors: 王群勇; 严拴航; 陈冬梅
Original assignee: BEIJING SHENGTAOPING TEST ENGINEERING TECHNOLOGY RESEARCH INSTITUTE
Current assignee: BEIJING SHENGTAOPING TEST ENGINEERING TECHNOLOGY RESEARCH INSTITUTE
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2021-07-16

Abstract

The embodiment of the invention provides a risk control method and a risk control device of a high-safety electronic system, wherein the method comprises the following steps: acquiring all fault classes under each type of functional failure state according to the types of the functional failure states, and determining the predicted values of all fault rates and the recovery time of the faults so as to determine that the whole system meets the task requirements; determining that each fault has a corresponding depth defense strategy; determining that each type of fault meets a preset safety margin according to the predicted value of the fault rate and the index value of the fault rate; determining a key fault type according to the damage degree caused by each fault type and whether the damage influence is acceptable; and a monitoring system is arranged for determining each key fault type, and the occurrence of the fault and the recovery time of the fault can be monitored. The method ensures that the whole system meets the task requirement, each fault meets the preset safety margin, the risk is limited in a controlled state, all key faults can be timely monitored, and implementation of slowing measures is facilitated.

Description

Risk control method and device for high-safety electronic system

Technical Field

The invention relates to the field of risk control of electronic systems, in particular to a risk control method and device of a high-safety electronic system.

Background

Electronic systems are subject to environmental influences during the task, such as spatial radiation. Space radiation environment refers to all natural radiation ranging from the ground to extra-space. Natural radiation environments include the Galaxy cosmic rays, the solar cosmic rays, the earth's capture zone, atmospheric neutron radiation. From the ground to 36000 km high altitude, is the daily working environment of electronic systems such as satellites, airplanes, ground network computers and the like. The outer space vehicles such as satellites working for more than 100 kilometers are mainly influenced by the cosmic rays of the silver river, the solar cosmic rays and the earth capture zone; the main radiation environment of the near space aircraft within the range of 20 kilometers to 100 kilometers is mainly neutrons, protons, electrons and the like; the aircraft is mainly influenced by neutron radiation in the atmosphere within the range of 3 kilometers to 20 kilometers; the radiation environment below 3 km and on the ground is mainly neutrons.

Radiation particles in a spatial radiation environment can have severe radiation damage effects on materials and radiation sensitive devices. The radiation damage effect can be classified into a Single Event Effect (SEE), a total ionization dose effect (TID), a displacement damage effect (DD), and the like according to the type of the radiation damage effect, so that the electronic device is caused to malfunction. Cumulative effects such as total dose effects, displacement damage effects, etc. can cause hard failures of semiconductor integrated circuits in electronic systems, while transient effects such as single event effects can cause hard and/or soft failures of semiconductor integrated circuits in electronic systems. Therefore, the natural space environment hazard can affect the success of the task, and a set of risk impact evaluation method for evaluating the space environment hazard on the success of the task is needed to quantitatively and accurately guide the selection of key devices of the electronic system, the protection design of key functions and the reliability improvement of the key system, effectively avoid over-design and under-design and ensure the success of the task.

Disclosure of Invention

In order to solve the above problems, embodiments of the present invention provide a method and an apparatus for risk control of a high-security electronic system.

In a first aspect, an embodiment of the present invention provides a risk control method for a high-security electronic system, including: acquiring all fault classes under each type of functional failure state according to the types of the functional failure states, acquiring all fault classes of a system, and determining the predicted values of all fault rates and the recovery time of faults; determining that the whole system meets the task requirement according to the predicted values of all fault rates and the recovery time of the faults; determining that each fault has a corresponding depth defense strategy; determining that each type of fault meets a preset safety margin according to the predicted value of the fault rate and the index value of the fault rate; determining a key fault type according to the damage degree caused by each fault type and whether the damage influence is acceptable; and a monitoring system is arranged for determining each key fault type, and the occurrence of the fault and the recovery time of the fault can be monitored.

Further, the determining that the whole system meets the task requirement according to the predicted values of all the failure rates and the recovery time of the failure includes: according to an availability formula, if the availability of the system meets a preset threshold, the whole system meets the task requirement; the availability formulas include:

accordingly, the method further comprises: according to an availability formula, adjusting the mitigating measures of each fault so that the system availability meets a preset threshold; where A is system availability, i is fault class, λ_iFailure rate, T, for class i failures_iThe recovery time for a class i failure.

Further, the categories of the functional failure states include: planned outage short term hard failures, planned outage long term hard failures, unplanned outage short term hard failures, unplanned outage long term hard failures, unplanned outage short term soft failures, and unplanned outage long term soft failures.

Further, the method further comprises: determining a fault object needing important monitoring, and acquiring a true value of a fault rate; and determining whether the fault rate index value meets the task requirement or not according to the fault rate true value and the fault rate index value, and if not, adjusting the index value.

Further, the fault objects which are monitored in an important mode comprise faults which do not meet preset safety margins.

Further, the determining the predicted values of all the failure rates includes obtaining the total failure rate of each component after responding to all the failure physical mechanisms of the space environment according to a BOM list of the system, and determining the predicted value of each failure rate.

Further, the obtaining of the total failure rate of each device after responding to all the failed physical mechanisms includes: acquiring a single event effect fault rate, a total dose effect fault rate and a displacement damage fault rate caused by space radiation response to obtain a space radiation fault rate; acquiring the response of the components to all non-space radiation failure physical mechanisms to cause the non-space radiation failure rate; and the total sum of the space radiation fault rate and the non-space radiation fault rate is combined with the device manufacturing parameters and the process parameters to obtain the total fault rate of the components in the space environment.

In a second aspect, an embodiment of the present invention provides a risk control device for a high-security electronic system, including: the fault determining module is used for acquiring all faults of each type of functional failure state according to the types of the functional failure states, acquiring all faults of the system, and determining the predicted values of all fault rates and the recovery time of the faults; the task decision module is used for determining that the whole system meets the task requirements according to the predicted values of all fault rates and the recovery time of the faults; the defense-in-depth judgment module is used for determining that each fault has a corresponding slowing measure; the design margin judgment module is used for determining that each fault meets the preset safety margin according to the predicted value of the fault rate and the index value of the fault rate; the key fault determining module is used for determining the type of the key fault according to the damage degree caused by each fault class and whether the damage influence is acceptable; and the monitoring configuration module is used for determining that each key fault has a monitoring system and can monitor the occurrence of the fault and the recovery time of the fault.

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the risk control method for the high-security electronic system according to the first aspect of the present invention.

In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the risk control method of the high-security electronic system according to the first aspect of the present invention.

According to the risk control method and device of the high-safety electronic system, all faults of each type of functional failure state are obtained according to the type of the functional failure state, all faults of the system are obtained, the predicted value of all fault rates and the recovery time of the faults are determined, and all faults affecting task risks and corresponding fault rates are favorably obtained. And determining that the whole system meets the task requirement according to the predicted values of all fault rates and the recovery time of the faults, and determining that each fault meets a preset safety margin according to the predicted values of the fault rates and the index values of the fault rates so that the whole system meets the task requirement, each fault meets the preset safety margin, and the risk is limited in a controlled state. And each key fault is determined to have a monitoring system, so that all faults can be monitored in time, and implementation of a slowing measure is facilitated. The method effectively realizes risk control of the system.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flowchart illustrating a method for risk control of a high security electronic system according to an embodiment of the present invention;

FIG. 2 is a block diagram of a risk control device of a high security electronic system according to an embodiment of the present invention;

fig. 3 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The traditional evaluation method of space radiation environment reliability at home and abroad mainly has three defects:

(1) the method is lack of basic theory, and the quantitative influence relationship of the current space radiation effect on the success or failure of the mission of the aerospace vehicle cannot be scientifically explained.

The existing spacecraft mainly adopts a device-level radiation evaluation method to independently perform test evaluation of a Single Event Effect (SEE), a total dose effect (TID) and a displacement damage effect (DD). The comprehensive influence of the three types of radiation effects on the device is not considered, and the comprehensive influence of the three types of radiation effects on an electronic system of the spacecraft is not considered. At present, a Radiation Design Margin (RDM) method is mainly adopted, and a close quantitative relation is not established between the RDM method and the task indexes of the spacecraft, so that the comprehensive capacity of the three types of radiation effects on meeting the requirements of the spacecraft on completing specific tasks cannot be accurately and quantitatively evaluated. Due to the lack of basic theory, the quantitative influence relationship of the current space radiation effect on the success or failure of the aerospace vehicle task cannot be scientifically explained.

(2) And due to the lack of an evaluation method, the quantitative influence of the space radiation effect on the success or failure of the mission of the aerospace vehicle cannot be calculated.

The existing aerospace electronic load reliability prediction engineering method does not include space radiation environmental stress such as GJB299 and MIL-HDBK-217, so that certain task risk exists. MIL-STD-217F electronic device reliability prediction is noted in the text: all models in this manual cannot be used to predict the effect of nuclear survivability or ionizing radiation. The recommended value in the micro-circuit failure rate prediction model is 0.5, and the space environment is considered to be good and not consistent with the actual space radiation environment. The selection of devices meeting the requirements of specific task indexes is guided without a reliability theoretical basis and corresponding tools, so that the phenomenon that the satellite space radiation reliability is over-designed locally and under-designed locally exists simultaneously can be caused, and the task risk exists. Therefore, due to the lack of an evaluation method, the quantitative impact of the spatial radiation effect on the success or failure of the aerospace vehicle mission cannot be calculated.

(3) Lack of effective data and incapability of accurately calculating quantitative influence of space radiation effect on success or failure of mission of aerospace vehicle

The existing spacecraft mainly adopts a device-level radiation evaluation method to independently perform test evaluation of a Single Event Effect (SEE), a total dose effect (TID) and a displacement damage effect (DD). The index requirement of the single event effect of the key device is that the LET threshold of the single event latch caused by heavy ions is more than 75MeV/cm²mg, the LET threshold value of single event upset caused by heavy ions is more than 37MeV/cm²mg, or turnover number less than 10^-7The position of the day. The index requirement of the total dose effect of the key device is that the total dose resistance is usually more than 100rad, and the index requirement of the displacement damage effect of the key device is that the displacement damage resistance is usually more than the equivalent 10MeV proton 10¹⁰/cm². The data of different dimensions can not be superposed with a comprehensive effect, is not the failure rate of the success or failure of the spacecraft task caused by the space radiation effect, and is not the effective data for judging the success or failure of the task. Therefore, due to the lack of valid data, the quantitative impact of the spatial radiation effect on the success or failure of the aerospace vehicle mission cannot be accurately calculated.

In a word, due to the lack of basic theory, evaluation method and effective data, the success or failure of the task of the current in-orbit spacecraft cannot be scientifically explained.

At present, a set of aerospace system space radiation reliability test evaluation method is urgently needed, the phenomenon that over-design and under-design exist simultaneously in a large range is avoided as much as possible, and the influence of the space radiation environment effect on the success or failure risk of an aerospace vehicle task is mastered, quantitatively, acquainted, known and controllable.

Fig. 1 is a flowchart of a risk control method for a high-security electronic system according to an embodiment of the present invention, and as shown in fig. 1, the embodiment of the present invention provides a risk control method for a high-security electronic system, including:

and S0, acquiring all fault classes under each type of functional failure state according to the types of the functional failure states, acquiring all faults of the system, and determining the predicted values of all fault rates and the recovery time of the faults.

Electronic systems in high-safety application fields such as nuclear power stations, satellites and airplanes can generate various FuncTional Failure states (FFC) under the action of 7 stresses (temperature, humidity, electricity, chemistry, vibration, temperature circulation and radiation). When the aggregate frequency λ of the plurality of functional failure conditions exceeds an industry recognized acceptable red line threshold (e.g., 10)^-9H, etc.), a security risk arises.

The method is used for evaluating the electronic system before executing the task in the space environment, so that the risk of the electronic system for executing the task is reduced to the minimum, even to zero. The spatial environment may affect electronic components of an electronic system located therein to produce a spatial radiation effect (SRE effect) or a spatial non-radiation effect. For example, the radiation effect includes a charge-discharge effect, a single-particle effect, and the like, which affect the accuracy of the electronic device and cause an error, and the spatial environment of the electronic system corresponding to the task to be detected, including the temperature, the humidity, the concentration of each compound in the chemical environment, the radiation intensity, and other data, may be obtained first. Tasks are tasks performed by an electronic system, such as aviation tasks, aerospace tasks, nuclear power plant monitoring tasks.

To facilitate determining all faults, the category of the failed function state is determined or determined first in the embodiments of the present invention.

As a preferred example, the categories of fail states include: planned outage short term hard failures, planned outage long term hard failures, unplanned outage short term hard failures, unplanned outage long term hard failures, unplanned outage short term soft failures (with monitoring), and unplanned outage long term soft failures (without monitoring). I.e., category 6 fail-function states, each having multiple fault types.

A planned interruption is an interruption that can be predicted in advance, e.g., the power of an electronic system will run out, and is predicted in advance when the system is designed, and vice versa. Planned interrupts and unplanned interrupts are divided into long-term interrupts and short-term interrupts, respectively, depending on whether there is a backup or a repairable interrupt after the system has generated the interrupt. For example, a switch to a backup battery may be made, and a short-term scheduled interruption. For system soft errors, the exception types are classified into short-term soft failures with monitoring and long-term soft failures without monitoring, depending on whether the monitoring system (BIST) is set or not. The first four are hard faults and the last two are soft faults.

Under the space environment, various physical mechanisms of failure can cause the influence (including the inside production of components) to the electronic system to lead to the components and parts trouble, and the physical mechanism of failure includes: temperature, humidity, vibration, temperature cycling, chemical environment, electrical stress, radiation, and the like. And under each classification result, respectively acquiring a corresponding failure probability caused by the response to the failure physical mechanism. The recovery time of a fault is the time from the generation of the fault to the recovery of the fault, which is the countermeasure taken for the fault during design. Such as the time to switch to a backup battery after the charge is depleted. Aiming at FFC in a function failure state of 6 classes, the embodiment of the invention adopts an RIDM5 step analysis method to determine detailed and specific faults i and fault rates lambda in the 6 classes_iAnd time to failure recovery T_iThe combination of (1).

Under 6 types of functional failure conditions, all faults of each type of failure condition may be further acquired. For example, the classification is based on effect classification (neutron effect, proton effect), stress classification (the above 7 kinds of stress), task profile classification, and the like. Then, the risk control analysis of the system was performed in order according to the five analysis methods based on RIDM.

And S1, determining that the whole system meets the task requirement according to the predicted values of all fault rates and the recovery time of the faults.

All failures i, failure rate λ_iAnd time to failure recovery T_iAfter the determination, the overall compliance of the system is determined based on the three.

As an alternative embodiment, the determination is made according to the following formula:

where A is system availability, i is fault class, λ_iFailure rate, T, for class i failures_iThe recovery time for a class i failure.

That is, at all λ_iAfter all are determined, according to the corresponding T_iThe usability of the system as a whole can be calculated. For example, the preset threshold is 0.957, and a calculated according to the formula needs to satisfy the threshold to determine that the whole system satisfies the task requirement.

And S2, determining that each fault has a corresponding depth defense strategy.

The recovery time of the fault is obtained according to protection slowing measures during system design, the deep defense strategies comprise various protection slowing measures, and after the fact that the task requirements are integrally met is determined, whether each fault has a corresponding deep defense strategy is further judged. If not, the settings are added.

And S3, determining that each type of fault meets the preset safety margin according to the predicted value of the fault rate and the index value of the fault rate.

For each probability of failure occurrence, there is a corresponding λ_i-spec,λ_i-pre,λ_i-realThe index value, the expected value and the actual value are expressed respectively. According to the embodiment of the invention, whether each fault meets the preset safety margin or not is determined according to the predicted value and the index value. For example, the expected failure probability value for a certain type of failure is 10^-5And the index value of the task requirement is 10^-4If the ratio of the predicted value to the index value is 10, the safety margin of the system to the fault is 10 times, and if the ratio is greater than the preset value, the preset safety margin is met. All faults need to meet a safety margin.

And S4, determining the type of the key fault according to the damage degree caused by each fault class and whether the damage influence is acceptable.

First, a critical fault type generating a functional failure state may be determined according to a hazard consequence or a hazard degree caused by a fault, and the method for acquiring a joint fault according to the hazard degree caused by each fault type in the embodiments of the present invention is not particularly limited, and includes but is not limited to determining according to any one or a combination of more than one of an index required value of the fault, a fault rate predicted value, a recovery time of a mitigating measure, and a safety margin of the fault. The threshold value may be set to be selected according to the above-mentioned single index parameter or a combination of index parameters.

According to the degree of damage caused by each fault class and whether the damage influence is acceptable, the method for determining the key fault type comprises the step of determining the degree of damage according to the index requirement value, and accordingly determining the key fault type. For fault classes with large damage degree, the required value of the index is often set to be smaller, such as 10^-9The fault rate index required value can be selected to be less than 10^-6As a critical fault type.

The method for determining the type of the critical fault further comprises the following steps according to the degree of damage caused by each fault class and whether the damage influence is acceptable: determined by a predicted failure rate, e.g. selecting a failure rate greater than 10^-6As a critical fault type. Or according to the safety margin, for example, the fault type with the safety margin less than 2 times is a critical fault. Or obtaining the key fault according to an availability formula, wherein the product of the failure rate predicted value and the failure recovery time is larger than a preset threshold value, or comprehensively determining the type of the key fault according to the system availability.

Second, the critical fault type may be determined based on whether the detrimental effects are acceptable. Determining whether the detrimental impact is acceptable includes determining based on a safety margin, where if the safety margin is greater than a predetermined threshold, the detrimental impact from the fault class is acceptable, and otherwise is not acceptable. Finally, joint failure types can be determined comprehensively based on the extent of the hazard and whether the hazard impact is acceptable. For example, the fault is 10 according to the degree of damage and the index requirement value^-9(class A) 10^-7(class B) and 10^-5(class C), the degree of corresponding damage is from large to small. A type A fault can be selected as a key fault, and whether the damage influence is acceptable or not can be considered on the basis. For example, the index expected value of a certain class A fault is 10^-10If the safety margin is 10 times, the harmful effect is considered to be acceptable and is not taken as a critical fault; of a certain class B faultThe index expected value is 10^-7With only 1 time the safety margin, the detrimental effect is considered unacceptable and is considered a critical fault.

And S5, determining each key fault type, wherein a monitoring system is arranged, and the occurrence of the fault and the recovery time of the fault can be monitored.

I.e. to ensure that a critical fault i can be monitored, data of the recovery time Ti can be obtained. And detecting whether the critical fault i is provided with a monitoring system (BIST), and if not, setting a corresponding monitoring system. And the monitoring system is used for monitoring the occurrence of the fault i, recovering the fault through a preset slowing measure after the occurrence of the fault is monitored, and obtaining the recovery time Ti. The Ti design value may be evaluated and adjusted based on the actual recovery time for the failure.

According to the risk control method of the high-safety electronic system, all faults of each type of functional failure state are obtained according to the type of the functional failure state, all faults of the system are obtained, the expected values of all fault rates and the recovery time of the faults are determined, and all faults affecting task risks and corresponding fault rates are favorably obtained. And determining that the whole system meets the task requirement according to the predicted values of all fault rates and the recovery time of the faults, and determining that each fault meets a preset safety margin according to the predicted values of the fault rates and the index values of the fault rates so that the whole system meets the task requirement, each fault meets the preset safety margin, and the risk is limited in a controlled state. And each key fault is determined to have a monitoring system, so that all key faults can be monitored in time, and implementation of slowing measures is facilitated. The method effectively realizes risk control of the system.

Based on the content of the foregoing embodiment, as an optional embodiment, determining that the system as a whole meets the task requirement according to the predicted values of all failure rates and the recovery time of the failure includes: the step of determining that the whole system meets the task requirements according to the predicted values of all the failure rates and the recovery time of the failures comprises the following steps: according to an availability formula, if the availability of the system meets a preset threshold, the whole system meets the task requirement; the availability formulas include:

If not, the corresponding protection slowing-down measures can be adjusted, and the recovery time T of the measures is adjusted_iAnd until A meeting preset conditions can be obtained, thereby realizing risk control of the system. That is, at all λ_iAfter all are determined, T is adjusted by adjusting the mitigation strategy_iAnd A meeting the preset conditions can be obtained, so that the overall risk control of the system is realized.

The availability threshold may be set experimentally or analytically for different task environments, depending on the characteristics of the electronic system, or requirements. By adjusting the response time of each fault, the availability is larger than a preset threshold value, so that the risk is reduced to an acceptable range, missing items and differences of change risk control can be found, key items influencing task index requirements can be mastered, and depth prevention and control measures can be supplemented. In principle, multiple iterations can realize zero risk, or find bottlenecks that affect zero risk realization, and make the risk state clear.

Based on the content of the foregoing embodiment, as an optional embodiment, the preset classification manner includes: planned outage short term hard failures, planned outage long term hard failures, unplanned outage short term hard failures, unplanned outage long term hard failures, unplanned outage short term soft failures (with monitoring), and unplanned outage long term soft failures (without monitoring). The details have been mentioned in the above embodiments, and are not described herein again. The method for dividing the abnormal types can analyze each type of abnormal types in a targeted manner, and is beneficial to the classification of fault types.

As an alternative embodiment, the planned interruption short term hard failures may not be taken as a fault i and corresponding fault rate in the determination of whether the mission requirements are met as a whole.

Based on the content of the foregoing embodiment, as an optional embodiment, the method further includes: determining a fault object needing important monitoring, and acquiring a true value of a fault rate; and determining whether the fault rate index value meets the task requirement or not according to the fault rate true value and the fault rate index value, and if not, adjusting the index value.

For some fault types, whether the index value is set reasonably needs to be further judged, that is, the fault object needs to be monitored in a critical way, the fault object monitored in the critical way can be all faults, or can be a key fault type, and the influence degree caused by the faults can be determined.

The actual value of the failure rate can be determined according to the test result of the system or the components forming the system in the space environment. And for the index value and the real value, verifying whether the requirement of the verification index meets the requirement of the task or not, namely verifying whether the verification index is reasonable or not through the proportion of the real value and the index value and the comparison result of the preset threshold. If the index value is not reasonable, the index value is correspondingly adjusted, and the adjusted index value is used in the RIDM 5-step analysis method. Through multiple iterations, zero risk can be realized, or bottlenecks affecting zero risk realization are found, and risk states are clarified, so that the aim of zero risk is realized in principle.

Based on the content of the foregoing embodiments, as an optional embodiment, the monitored fault objects are emphasized, including faults that do not meet the preset safety margin.

The situation that the fault does not meet the preset safety margin due to the setting of the index value can be considered in real application. For faults that do not meet the preset safety margin, this may be caused by unreasonable setting of the index value. Taking the faults as fault objects needing important monitoring, and acquiring the true value of the fault rate; and determining whether the fault rate index value meets the task requirement or not according to the fault rate true value and the fault rate index value, and if not, adjusting the index value.

Based on the content of the foregoing embodiment, as an alternative embodiment, determining the expected value of all failure rates includes: according to a BOM (bill of material) list of the system, the total failure rate of each device after responding to each failure physical mechanism of the space environment is obtained, and the predicted value of each failure rate is determined.

Specifically, each electronic system is configured with a plurality of devices, to a circuit board of each device, to a device of each circuit board, classified into 4 stages. And all components of the system can be obtained according to the BOM list of the system. And on the basis of all the components, respectively acquiring the total failure rate of each component after responding to each failure physical mechanism of the space environment. As mentioned above, the temperature, humidity, vibration, temperature cycle, chemical environment, electrical stress, radiation, etc. are determined according to the result of the response of each failure physical mechanism and the corresponding parameter values of the space environment, respectively, and the corresponding failure rate of the components can be obtained through experiments, for example. On the basis, the failure rate of the failure i can be determined according to the failure rate of each component.

According to the BOM list of the system, all electronic components of the electronic system can be considered from four levels, all failure physical mechanisms generating response are considered, and all fault risks can be covered.

Based on the content of the foregoing embodiments, as an alternative embodiment, obtaining the total failure rate of each device after responding to all the failed physical mechanisms includes: acquiring a single event effect fault rate, a total dose effect fault rate and a displacement damage fault rate caused by space radiation response to obtain a space radiation fault rate; acquiring the response of the components to all non-space radiation failure physical mechanisms to cause the non-space radiation failure rate; and the total sum of the space radiation fault rate and the non-space radiation fault rate is combined with the device manufacturing parameters and the process parameters to obtain the total fault rate of the components in the space environment.

Considering the response of each component to each physical failure mechanism, embodiments of the present invention may determine the failure rate of each component according to the following method:

first, consider the case of radiation and the case of non-radiation, respectively:

λ_space＝λ_SEE+λ_TID+λ_DD；

wherein λ is_spaceFor the failure rate due to spatial radiation, λ_TIDFailure rate due to total dose effect, λ_DDFor failure rate due to displacement damage, λ_SEEFor the failure rate caused by the single event effect, the specific method for acquiring the failure rate of each type is the prior art.

λ_physical＝λ_NOspace+λ_space；

Wherein λ is_NOspaceFor failure rates due to non-spatial radiation, obtainable by the prior art, λ_physicalIs the total failure rate due to the physical mechanism of failure.

Specifically, see the literature "a method for predicting Reliability of Space RadiaTion Environment" (A method of Space radiaTion Environment Reliability predicition).

The total failure rate of each device in response to each failure physical mechanism of the space environment can be obtained according to the following formula:

λ＝λ_physical×Π_PM×Π_Process；

wherein, λ is the failure rate of a single component in the space environment, pi_PMFor device manufacturing parameters, parameters representing quality and technical influence of device manufacturing, Π_ProcessThe parameters are process parameters and represent quality and technical control parameters of the product development, manufacturing and using processes of the device.

For all components, the availability of the system is achieved according to the method described above.

Based on the embodiments, through 2 verification results of the index value, the predicted value and the true value, the adjustment of the usability formula on the slowing measures or the design scheme can find the missing items and the gaps of the change risk control. And (4) controlling key items influencing the task index requirements, and supplementing depth prevention and control measures. In principle, multiple iterations can realize zero risk, or find bottlenecks that affect zero risk realization, and make the risk state clear. Thus achieving the goal of zero risk in principle.

Fig. 2 is a block diagram of a risk control device of a high-security electronic system according to an embodiment of the present invention, as shown in fig. 2, the risk control device of the high-security electronic system includes: a fault determination module 200, a task decision module 201, a defense-in-depth determination module 202, a design margin determination module 203, a critical fault determination module 204, and a monitoring configuration module 205. The failure determining module 200 is configured to obtain all failures of each type of the functional failure state according to the type of the functional failure state, obtain all failures of the system, and determine expected values of all failure rates and recovery time of the failures; the task decision module 201 is configured to determine that the whole system meets task requirements according to the predicted values of all failure rates and the recovery time of the failures; the defense-in-depth judgment module 202 is configured to determine that each fault has a corresponding mitigation measure; the design margin judgment module 203 is used for determining that each fault meets the preset safety margin according to the predicted value of the fault rate and the index value of the fault rate; the critical fault determining module 204 is configured to determine a type of a critical fault according to a degree of damage caused by each fault class and whether a damage influence is acceptable; the monitoring configuration module 205 is configured to determine that there is a monitoring system for each failure, and is capable of monitoring the occurrence of the failure and the recovery time of the failure.

The device embodiment provided in the embodiments of the present invention is for implementing the above method embodiments, and for details of the process and the details, reference is made to the above method embodiments, which are not described herein again.

According to the risk control device of the high-safety electronic system, all faults of each type of functional failure state are obtained according to the type of the functional failure state, all faults of the system are obtained, the expected values of all fault rates and the recovery time of the faults are determined, and all faults affecting task risks and corresponding fault rates are favorably obtained. And determining that the whole system meets the task requirement according to the predicted values of all fault rates and the recovery time of the faults, and determining that each fault meets a preset safety margin according to the predicted values of the fault rates and the index values of the fault rates so that the whole system meets the task requirement, each fault meets the preset safety margin, and the risk is limited in a controlled state. And each key fault is determined to have a monitoring system, so that all key faults can be monitored in time, and implementation of slowing measures is facilitated. The method effectively realizes risk control of the system.

Fig. 3 is a schematic entity structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 3, the electronic device may include: a processor (processor)301, a communication Interface (CommunicaTions interfaces) 302, a memory (memory)303 and a bus 304, wherein the processor 301, the communication Interface 302 and the memory 303 complete communication with each other through the bus 304. The communication interface 302 may be used for information transfer of an electronic device. Processor 301 may call logic instructions in memory 303 to perform a method comprising: acquiring all fault classes under each type of functional failure state according to the types of the functional failure states, acquiring all fault classes of a system, and determining the predicted values of all fault rates and the recovery time of faults; determining that the whole system meets the task requirement according to the predicted values of all fault rates and the recovery time of the faults; determining that each fault has a corresponding depth defense strategy; determining that each type of fault meets a preset safety margin according to the predicted value of the fault rate and the index value of the fault rate; determining a key fault type according to the damage degree caused by each fault type and whether the damage influence is acceptable; and a monitoring system is arranged for determining each key fault type, and the occurrence of the fault and the recovery time of the fault can be monitored.

In addition, the logic instructions in the memory 303 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above-described method embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the transmission method provided in the foregoing embodiments when executed by a processor, and for example, the method includes: acquiring all fault classes under each type of functional failure state according to the types of the functional failure states, acquiring all fault classes of a system, and determining the predicted values of all fault rates and the recovery time of faults; determining that the whole system meets the task requirement according to the predicted values of all fault rates and the recovery time of the faults; determining that each fault has a corresponding depth defense strategy; determining that each type of fault meets a preset safety margin according to the predicted value of the fault rate and the index value of the fault rate; determining a key fault type according to the damage degree caused by each fault type and whether the damage influence is acceptable; and a monitoring system is arranged for determining each key fault type, and the occurrence of the fault and the recovery time of the fault can be monitored.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for risk control of a high security electronic system, comprising:

acquiring all fault classes under each type of functional failure state according to the types of the functional failure states, acquiring all fault classes of a system, and determining the predicted values of all fault rates and the recovery time of faults;

determining that the whole system meets the task requirement according to the predicted values of all fault rates and the recovery time of the faults;

determining that each fault has a corresponding depth defense strategy;

determining that each type of fault meets a preset safety margin according to the predicted value of the fault rate and the index value of the fault rate;

determining a key fault type according to the damage degree caused by each fault type and whether the damage influence is acceptable;

and a monitoring system is arranged for determining each key fault type, and the occurrence of the fault and the recovery time of the fault can be monitored.

2. The method for risk control of a high-safety electronic system according to claim 1, wherein the determining that the system as a whole meets the task requirements according to the predicted values of all failure rates and the recovery time of the failures comprises:

according to an availability formula, if the availability of the system meets a preset threshold, the whole system meets the task requirement;

the availability formulas include:

accordingly, the method further comprises:

according to an availability formula, adjusting the mitigating measures of each fault so that the system availability meets a preset threshold;

3. The risk control method of a high-security electronic system according to claim 1, wherein the categories of the functional failure states include:

planned outage short term hard failures, planned outage long term hard failures, unplanned outage short term hard failures, unplanned outage long term hard failures, unplanned outage short term soft failures, and unplanned outage long term soft failures.

4. The method for risk control of a high security electronic system of claim 1, further comprising:

determining a fault object needing important monitoring, and acquiring a true value of a fault rate;

and determining whether the fault rate index value meets the task requirement or not according to the fault rate true value and the fault rate index value, and if not, adjusting the index value.

5. The risk control method of a high-safety electronic system according to claim 4, wherein the failure object of the focus monitoring includes a failure not meeting a preset safety margin.

6. The risk control method of a high-security electronic system of claim 1, wherein the determining the expected value of all failure rates comprises:

and according to a BOM list of the system, acquiring the total failure rate of each component after responding to all failure physical mechanisms of the space environment, and determining the predicted value of each failure rate.

7. The risk control method for high security electronic system according to claim 1, wherein the obtaining the total failure rate of each device after responding to all failure physical mechanisms comprises:

acquiring a single event effect fault rate, a total dose effect fault rate and a displacement damage fault rate caused by space radiation response to obtain a space radiation fault rate;

acquiring the response of the components to all non-space radiation failure physical mechanisms to cause the non-space radiation failure rate;

and the total sum of the space radiation fault rate and the non-space radiation fault rate is combined with the device manufacturing parameters and the process parameters to obtain the total fault rate of the components in the space environment.

8. A risk control device for a high security electronic system, comprising:

the fault determining module is used for acquiring all faults of each type of functional failure state according to the types of the functional failure states, acquiring all faults of the system, and determining the predicted values of all fault rates and the recovery time of the faults;

the task decision module is used for determining that the whole system meets the task requirements according to the predicted values of all fault rates and the recovery time of the faults;

the defense-in-depth judgment module is used for determining that each fault has a corresponding slowing measure;

the design margin judgment module is used for determining that each fault meets the preset safety margin according to the predicted value of the fault rate and the index value of the fault rate;

the key fault determining module is used for determining the type of the key fault according to the damage degree caused by each fault class and whether the damage influence is acceptable;

and the monitoring configuration module is used for determining that each key fault type has a monitoring system and can monitor the occurrence of the fault and the recovery time of the fault.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, implements the steps of the risk control method of the high-security electronic system according to any one of claims 1 to 7.

10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the risk control method of a high-security electronic system according to any one of claims 1 to 7.