CN117708739A

CN117708739A - Space effective load anomaly detection method based on improved random forest

Info

Publication number: CN117708739A
Application number: CN202311734158.7A
Authority: CN
Inventors: 李鹏; 施建明; 秦泰春; 王建星; 王哲
Original assignee: Technology and Engineering Center for Space Utilization of CAS; Beijing Institute of Spacecraft Environment Engineering
Current assignee: Technology and Engineering Center for Space Utilization of CAS; Beijing Institute of Spacecraft Environment Engineering
Priority date: 2023-12-15
Filing date: 2023-12-15
Publication date: 2024-03-15

Abstract

The invention discloses a space effective load anomaly detection method based on an improved random forest, which comprises sample anomaly detection and equipment anomaly detection, wherein the sample anomaly detection firstly distributes larger weight to a decision tree with higher precision through an improved weighted voting random forest algorithm, reduces the influence of the decision tree with poor classification effect on the result, and carries out sample-level anomaly detection; and secondly, detecting equipment abnormality, mapping a sample layer fault early warning result to equipment level fault early warning through a sliding window method, and simultaneously optimizing a sliding window step length and a failure threshold value by considering cost penalty functions of missed judgment and misjudgment, so as to solve the problem of balancing the detection rate and the false alarm rate. According to the method, the lightweight anomaly detection model is established, so that the space payload is helped to analyze on-orbit monitoring data autonomously under the condition of lacking of the support of earth ground measurement and control resources, anomaly and positioning faults are identified rapidly, fault processing and recovery measures are adopted, planning decision is assisted, and safe, reliable and stable operation of the spacecraft is ensured.

Description

Space effective load anomaly detection method based on improved random forest

Technical Field

The invention relates to the technical field of space payload anomaly detection, in particular to a space payload anomaly detection method based on an improved random forest.

Background

The spacecraft has high flight task complexity and severe running environment, and has extremely high requirements on the reliability and safety of the system. In addition to developing wide-ranging and serialized space science and application tasks during on-orbit operation, mass monitoring data will be generated. Whether the data is abnormal or not is closely related to the health degree of the effective load and the execution state of the application task, the monitoring data is scientifically and effectively processed and analyzed on line, and the abnormal event of the effective load or the space task can be rapidly positioned and timely reflected. The trend of complexity, task diversity, long-term on-orbit operation place an urgent need for the ability of autonomous on-orbit health management of space payloads.

As a key technology of fault prediction and health management (Prognostics and Health Management, PHM), anomaly detection refers to the discovery of potential anomaly patterns in data through mining and analysis of monitoring data of a spacecraft, and the characterization of whether an anomaly behavior of the spacecraft occurs. At present, common anomaly detection methods mainly comprise two types of methods, namely model driving and data driving. The model driving method mainly comprises the steps of constructing a residual signal between a mathematical model and the output of an actual system, and comparing the residual signal with a set threshold value so as to judge whether a fault occurs. However, it is difficult to build an accurate mathematical model due to the complex structure and function of the product. Compared with the model driven anomaly detection method, the data driving method is particularly based on artificial intelligent algorithms such as machine learning, deep learning and the like, such as classification, regression, sequencing, dimension reduction and clustering methods of a support vector machine, a decision tree, a neural network, migration learning, a random forest and the like, utilizes the existing historical data, performs training analysis on monitoring data under normal and fault conditions, can complete anomaly detection tasks without establishing a quantitative mathematical model, overcomes the defects of large judgment deviation depending on expert experience, lack of rules of the existing fault knowledge base and the like, and is a current research hotspot. However, the neural network models such as CNN/LSTM/DBN and the like have large calculation amount and poor interpretability, particularly, a large amount of data is needed to be used as a training sample, the requirement on data quality is high, the historical data of aerospace products, particularly, fault data are limited, and the accuracy of a deep learning method is greatly influenced, so that a machine learning method is adopted for detecting the faults of a receiver.

Since telemetry data contains only a very small number of positive-type (anomalous) samples and a large number of negative-type (normal) samples, there may be many different types of anomalies, known or unknown. Traditional classification algorithms are inclined towards the majority class, resulting in low detection rates for minority class samples. In addition, in order to improve the detection rate of few types of samples, besides the common resampling and feature processing data preprocessing means, an algorithm improvement thought of an integration method and a cost sensitive method can be adopted. The random forest algorithm adopts an integrated learning strategy, and combines the classification results of a plurality of weak classifiers, so that the result of the overall model has higher accuracy and generalization performance, and simultaneously has good stability and is widely applied to various business scenes. The cost sensitive method gives higher misclassification cost to misclassification samples in a few classes, so that the inclination of the classifier to the majority classes is reduced, but accurate determination of cost factors is a difficulty. On the other hand, the traditional fault early warning based on single criteria can lead to high false alarm rate, and the excessively high false alarm rate can lead to loss of information of ground operation and control personnel to a health management system and also can lead to error isolation reconstruction of equipment. Therefore, in order to effectively suppress the high false alarm rate, hierarchical progressive reasoning is introduced.

Therefore, how to solve the problem of low classification detection rate and high false alarm rate caused by unbalanced positive and negative sample poles in the space payload telemetry data becomes a problem to be solved by the technicians in the field,

disclosure of Invention

The object of the present invention is to provide a method for detecting anomalies in a space payload based on an improved random forest, so as to solve the aforementioned problems of the prior art.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

a method for detecting anomalies in a spatial payload based on an improved random forest, comprising the steps of:

sample anomaly detection: by improving a weighted voting random forest algorithm, a decision tree with higher precision is assigned with larger weight, the influence of the decision tree with poor classification effect on the result is reduced, and sample-level anomaly detection is carried out;

detecting equipment abnormality: mapping the sample layer fault early warning result to the equipment level fault early warning through a sliding window method, and simultaneously optimizing the sliding window step length and the failure threshold value by considering the cost penalty functions of missed judgment and misjudgment so as to solve the problem of the tradeoff of the detection rate and the false alarm rate.

Preferably, the random forest algorithm includes: constructing a decision tree and integrating voting;

the decision tree construction includes: decision nodes, branches and leaf nodes;

wherein:

the decision nodes represent the characteristics of judging the category of the sample to be classified;

the branches represent different values of the decision nodes;

the leaf nodes then represent the last diagnosed class.

Preferably, the sample abnormality detection is performed according to the following formula:

wherein h is _t (X) is the output of the t decision tree, I (X) is an indication function, the function value is equal to 1 when the parameters in the function are true, otherwise, the function value is equal to 0, eta _t And (5) the weight coefficient of the t decision tree.

Preferably, greater weight is assigned to the decision tree according to the logic model, specifically according to the following formula:

wherein p is _t For accuracy, η _t And (5) the weight coefficient of the t decision tree.

Preferably, a Matthews correlation coefficient (Matthews correlation coefficient, MCC) evaluation index is introduced for better description of classification accuracy; specifically, the method is carried out according to the following formula:

wherein TP represents the number of samples in which the real fault is classified as a fault, and TN represents the number of samples in which the real fault is classified as healthy; FP represents the true health classified as a fault, FN represents the number of samples the true health classified as healthy;

MCC is substituted for p _t Meanwhile, the MCC value range (-1, 1) is considered to be adjusted, and a weight calculation model is established as follows:

preferably, the specific method for detecting the abnormality of the equipment comprises the following steps:

dividing the time sequence data of the sample into a plurality of subsequences by a sliding window method for the sample with the abnormality detection, analyzing the data characteristics of the subsequences, and detecting whether the equipment is abnormal or not;

comprising the following steps:

step 1, by modifying the followingMethod for detecting anomaly of samples of forest machines to obtain binary state time sequence data X of samples of each equipment _i I=1, 2, …, T; when X is _i =1 indicates that the sample is abnormal at this time, X _i Sample is normal for =0;

step 2, adopting a sliding window strategy with step length of Sw to perform anomaly detection segment by segment, and X _t -T is the sequence observed from T-T to T times; number Y of abnormal samples in window _t The method comprises the following steps:

in healthy equipment, xi can be regarded as a Bernoulli test sequence, and the probability of abnormal samples in a window sequence is as follows:

P(X _i ＝1)＝p _i ，i＝t-S _w +1，...，t，

the Pulse odds-ratio at time t can be expressed as:

the generalized likelihood ratio in logarithmic form for assuming health can be converted into:

the above pair-wise compiling can be solved:

given a training and/or testing dataset of a device, the device's final health state Ω _i E {0,1} i=1, …, I is a priori knowledge, so the training/test set can be expressed as { (G) ₁ ，Ω ₁ )，...(G _l ，Ω _l ) Device i contains m generalized likelihood ratios G _i ＝{G _i1 ，...，G _im -a }; when the failure threshold FT is given, the generalized likelihood ratio of device i is tagged as

As can be seen from the above calculation, once G _ij In excess of the TF and,an abnormal state is identified, a fault alert is performed, and a time to failure is determined. As the sliding window moves, the device health state may be estimated as:

preferably, the device abnormality detection further includes: optimizing cost function parameters according to the cost sensitive function;

the cost function is expressed as:

E＝λ ₁ ·N _FN +λ ₁ ·N _FP

wherein NFN and NFP are respectively the undetected number and false alarm number of the equipment, lambda ₁ And lambda (lambda) ₂ Normalized parameter adjustment coefficients between NFN and NFP; by optimizing the step size Sw and the failure threshold FT, the cost function can be minimized, and finally, the balance between the detection rate and the false alarm rate is achieved from the viewpoint of cost:

subject to：1≤S _w ≤N _i ，

0≤λ≤1，

wherein the cost E (S _w ，T _F ) As an implicit function related to Sw and TF, G _i For the generalized likelihood ratio of device I, I is the number of test set devices, N _i Is the number of samples for device i.

The beneficial effects of the invention are as follows:

the invention discloses a space effective load anomaly detection method based on an improved random forest, which comprises sample anomaly detection and equipment anomaly detection, wherein the sample anomaly detection firstly distributes larger weight to a decision tree with higher precision through an improved weighted voting random forest algorithm, reduces the influence of the decision tree with poor classification effect on the result, and carries out sample-level anomaly detection; and secondly, detecting equipment abnormality, mapping a sample layer fault early warning result to equipment level fault early warning through a sliding window method, and simultaneously optimizing a sliding window step length and a failure threshold value by considering cost penalty functions of missed judgment and misjudgment, so as to solve the problem of balancing the detection rate and the false alarm rate. The method is suitable for an on-orbit fault management scene of a space effective load with limited calculation force and resources by establishing a lightweight anomaly detection model, is beneficial to autonomously analyzing on-orbit monitoring data under the condition of lacking the support of earth ground measurement and control resources when deep space detection tasks such as moon, mars and the like are executed, rapidly identifying anomalies and positioning faults, adopting fault processing and recovery measures, assisting planning decisions and guaranteeing safe, reliable and stable operation of a spacecraft.

Drawings

FIG. 1 is a block diagram of a two-step anomaly detection process for a spatial payload anomaly detection method based on an improved random forest in accordance with the present invention;

FIG. 2 is a block flow diagram of another embodiment of a two-step anomaly detection method based on an improved random forest spatial payload anomaly detection method of the present invention;

fig. 3 is a flow chart of a random forest algorithm based on a method of improving random forest spatial payload anomaly detection in accordance with the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. Examples of the embodiments are illustrated in the accompanying drawings, wherein like or similar symbols indicate like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention. In the description of the present invention, it should be understood that the terms "top," "bottom," "inner," "outer," "axis," "circumferential," and the like indicate an orientation or a positional relationship based on that shown in the drawings, and are merely for convenience in describing the present invention or simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

In the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured," "engaged," "hinged," and the like are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communicated with the inside of two elements or the interaction relationship of the two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.

In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "examples," "particular examples," "one particular embodiment," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Referring to fig. 1,2 and 3, the space payload anomaly detection method based on the improved random forest comprises sample anomaly detection, wherein a decision tree with higher precision is assigned with a larger weight by improving a weighted voting random forest algorithm, the influence of the decision tree with poor classification effect on the result is reduced, and sample-level anomaly detection is carried out. And (3) detecting equipment abnormality, mapping a sample layer fault early warning result to equipment level fault early warning through a sliding window method, and optimizing a sliding window step length and a failure threshold value by considering cost penalty functions of missed judgment and misjudgment so as to solve the problem of balancing the detection rate and the false alarm rate.

In the embodiment, the invention provides a two-stage anomaly detection method based on an improved random forest aiming at the problems of low classification detection rate and high false alarm rate caused by unbalanced positive and negative sample poles in space payload telemetry data and anomaly detection based on a single criterion. First, detecting sample abnormality. By improving the weighted voting random forest algorithm, larger weights are distributed to decision trees with higher precision, influence of the decision trees with poor classification effect on results is reduced, and sample-level anomaly detection is carried out. And secondly, detecting equipment abnormality. Mapping the sample layer fault early warning result to the equipment level fault early warning through a sliding window method, and simultaneously optimizing the sliding window step length and the failure threshold value by considering the cost penalty functions of missed judgment and misjudgment, thereby relieving the problem of the balance between the detection rate and the false alarm rate.

In some of these embodiments, the random forest algorithm includes decision tree construction and integrated voting. Decision tree construction includes decision nodes, branches, and leaf nodes. Wherein: the decision nodes represent the characteristics of judging the class of the sample to be classified, the branches represent different values of the decision nodes, and the leaf nodes represent the last diagnosed class.

In this embodiment, a random forest (random forest) method is proposed by Breiman in 2001, and is an integrated machine learning method, which essentially combines Bagging (Bootstrap aggregating) algorithm and random subspace algorithm to construct a classifier composed of a plurality of independent decision trees, so as to improve the disadvantage that the decision trees are easy to be fitted, and to obtain robust prediction, and avoid local convergence. This is because there are many potential interpretation or prediction variables for big data, and interpretation variables may have varying degrees of multiple co-linearity, so that perturbation (persistence) to the sample data may lead to a large variation of the optimal prediction model (combination of different interpretation variables), i.e. model uncertainty (model uncertainty). Random forests have such excellent performance, mainly due to "random" and "forests", one of which is made to have resistance to overfitting and one of which is made more accurate.

"random" is mainly manifested in two aspects:

in a first aspect, a sample perturbation: based directly on self-service sampling (Bootstrap Sampling), about 63.2% of the samples in the initial training set are present in one sample set, and result in variability in the data set.

Attribute perturbation: in the random forest, for each node of the base decision tree, k attributes are randomly selected from the characteristic attribute set of the node, and then an optimal attribute is selected from the k attributes for division. This heavy randomness also gives rise to variability in the base model.

The second aspect is "integrated" as embodied in: according to a plurality of (differential) sampling sets, a plurality of (differential) decision trees are obtained through training, and a simple voting or averaging method is adopted to improve the stability and generalization capability of the model.

In some embodiments, the random forest algorithm comprises the following steps:

step 1, giving an original data set N, wherein the sample number M and the characteristic attribute number S of the original data set N are randomly sampled in a replaced way through a self-help resampling (Bootstrap) method, and generating a training subset, wherein the sample number of the training subset is smaller than M.

And 2, randomly selecting S maximum positive integers which are smaller than and/or equal to log2M+1 from S characteristic attributes, inputting a training subset, and generating a decision tree according to a decision tree generation algorithm.

And step 3, repeating the step 1 and the step 2 for K times to generate K decision trees to form a random forest.

And 4, diagnosing the test set by using the generated decision tree, summarizing all decision tree results, and calculating a final sample classification result by using an integrated voting method to obtain a random forest algorithm classification result.

It should be noted here that there are two key steps in the random forest algorithm: decision tree construction and integrated voting. (1) Decision tree construction, a generalized learning method based on an example, extracts a tree classification model from a given unordered training sample, and includes decision nodes, branches and leaf nodes 3 parts. The decision nodes represent the characteristics of judging the class of the sample to be classified, the branches represent different values of the decision nodes, and the leaf nodes represent the last diagnosed class. The decision tree construction algorithms commonly used at present are C4.5, classification regression trees (classification and regression tree, CART) and the like.

The CART algorithm recursively divides each feature by a binary division method, so that a feature space is divided into a limited number of units, and predicted probability distribution is determined on the units; the CART algorithm uses Gini coefficients to select split features, with the coefficients representing model uncertainty, and the smaller the coefficient, the lower the uncertainty, the better the feature, as opposed to the information gain (rate).

Assuming that the current sample set D has K categories, the proportion of the ith sample is p _i (i=1, 2, …, K), wherein,the coefficient of the data set D is +.>

Wherein, gini (D) has a value range of 0,1, reflecting the probability of randomly extracting two samples from the data set D, whose class labels are inconsistent.

Binary division of a random variable D into D by a feature a ₁ And D ₂ Two parts, the coefficient of the kunity of D under the characteristic condition a is:

wherein, |D ₁ |、|D ₂ And I and D are the number of samples.

Let the candidate attribute set be a=a ₁ ，a ₂ ，…，a _M Selecting the attribute with the smallest division back matrix index as the optimal division attribute, namely:

a ^* argmin Gini (D, a). The random forest algorithm uses CART decision trees as a base learner.

(2) Integrated voting

The integrated learning method is a method for jointly deciding the same problem after combining different classifiers in order to improve the classification accuracy. In order to ensure that the classification result of the integrated classifier is better than that of a single classifier, the following preconditions are generally satisfied: and (1) the classification accuracy of the base classifier is higher than 50%. (2) The base classifiers should be as independent as possible, and the classification errors generated are different.

The voting method adopted in the random forest algorithm is a simple voting method, which is also called a majority voting method; the method gives each decision tree the same weight, K decision trees vote on each sample point according to the self-training diagnosis result, and finally the classification result with the largest vote number is the classification result of the sample point. For example, for a trained random forest model, the test set is X, the category number is C, the decision tree number is T, and the model output is:

in some embodiments, the integrated voting method is: and giving the same weight to each decision tree, voting each sample point by K decision trees according to the self-training diagnosis result, and finally obtaining the classification result with the largest number of votes as the classification result of the sample point.

In some embodiments, the sample anomaly detection is performed as follows:

wherein h is _t (X) is the output of the t decision tree, I (X) is an indication function, the function value is equal to 1 when the parameters in the function are true, otherwise, the function value is equal to 0, eta _t And (5) the weight coefficient of the decision tree is the t-th decision tree.

In the embodiment, the voting weight of each decision tree is the same when the conventional random forest votes, so that the influence of the evaluation precision of different decision trees on the final result is ignored, and the overall evaluation precision of the random forest is reduced. In order to increase the decision tree with high evaluation precision and reduce the influence of the decision tree with low evaluation precision on the final evaluation result, a weighted voting method is provided. And (3) evaluating the classification performance by using an out of bag (OOB) sample as a test set of each tree, so that a decision tree with better classification performance has larger weight to convert the evaluation accuracy into a voting weight of the decision tree.

In some embodiments, to assign greater weights to higher precision decision trees, we assume the weight coefficient and accuracy p _t Is subject to a logic model, and more weight is distributed to the decision tree according to the logic model, specifically according to the following formula:

wherein p is _t For accuracy, η _t And (5) the weight coefficient of the decision tree is the t-th decision tree.

In some embodiments, the field real health and fault sample ratio is considered to be extremely unbalanced, and the number of health samples greatly exceeds that of the fault samples, so that Matthews correlation coefficient (Matthews correlation coefficient, MCC) evaluation indexes are introduced for the data unbalance condition to better describe classification accuracy.

Matthews correlation coefficient (Matthews correlation coefficient, MCC) evaluation index was introduced to better describe classification accuracy, specifically by the following formula:

where TP represents the number of samples in which a real fault is classified as a fault, and TN represents the number of samples in which a real fault is classified as healthy. FP represents the true health classification as a fault, and FN represents the number of samples the true health classification as healthy.

in this embodiment, referring to fig. 1, the overall classification effect of each decision tree is evaluated through MCC, and voting weights are allocated to each decision tree according to MCC, so that the influence of the decision tree with poor classification effect on the result is reduced, the output result of the algorithm is more reasonable, and the overall classification performance of the data set is improved.

In some embodiments, the specific method for detecting the abnormality of the device is as follows:

and dividing the sample time sequence data of the sample with abnormal detection into a plurality of subsequences by a sliding window method, analyzing the data characteristics of the subsequences, and detecting whether the equipment is abnormal or not.

The method comprises the following steps:

step 1, obtaining binary state time sequence data X of samples of all equipment by a sample anomaly detection method based on an improved random forest _i I=1, 2, …, T. When X is _i =1 indicates this timeAbnormal sample, X _i The sample is normal for =0.

Step 2, adopting a sliding window strategy with step length of Sw to perform anomaly detection segment by segment, and X _t T is the sequence observed from T-T to T. The number of abnormal samples Yt in the window is:

in this embodiment, as the device continuously acquires the state monitoring sample data, the trained random forest model may be used to determine whether an abnormality exists in the monitoring sample at a certain time. The device may not fail immediately after the sample anomaly is detected. On the one hand, although the device has deviated from the normal state and an abnormal sign appears, the fault state has not been reached, and on the other hand, random measurement noise also causes false alarms to occur. In practice, the occurrence of anomalies often continues for a period of time, and equipment failure is determined only when sample anomalies are continuously and frequently detected, so as to reduce the influence of false alarms on failure prediction accuracy. Therefore, a Sliding Window (SW) based method is adopted to divide the sample time series data into a plurality of subsequences, and the data characteristics of the subsequences are analyzed to detect whether the device is abnormal.

The observation and expected value of sample anomalies within a fixed width sliding window can be compared based on a sliding window generalized likelihood ratio test (Generalized Likelihood Ratio Test, GLRT) method. The Neyman Pearson basic lements indicate that the detection that maximizes the detection rate at a given false alarm rate when choosing between the two hypotheses is a likelihood ratio test. The basic principle is that a device fails when the GLRT exceeds a preset threshold. In the GLRT algorithm, the likelihood ratio test can be evaluated by replacing unknown deterministic parameters with MLE estimates of the parameters.

Obtaining binary state time sequence data X of each equipment sample by a sample anomaly detection method based on an improved random forest _i I=1, 2, …, T. When X is _i =1 indicates that the sample is abnormal at this time, X _i The sample is normal for =0. Adopting the step length of S _w Is a sliding window strategy of (1)Segment anomaly detection, X _t T is the sequence observed from T-T to T. Number Y of abnormal samples in window _t The method comprises the following steps:

in healthy equipment, xi can be regarded as a bernoulli test sequence, and then the probability of abnormal samples in a window sequence is:

P(X _i ＝1)＝p _i ，i＝t-S _w +1，...，t

the Pulse odds-ratio at time t can be expressed as:

the above pair-wise compiling can be solved:

given a training/testing dataset of a device, the device's final health state Ω _i E {0,1}, i=1, …, I being a priori knowledge. The training/testing set may thus be expressed as { (G) ₁ ，Ω ₁ )，...(G _l ，Ω _l ) Device i contains m generalized likelihood ratios G _i ＝{G _i1 ，...，G _im }. When the failure threshold FT is given, the generalized likelihood ratio of device i is tagged as

Once G _ij In excess of the TF and,an abnormal state is identified, a fault alert is performed, and a time to failure is determined. As the sliding window moves, the device health status can be estimated as +.>In another embodiment, the cost sensitive function based parameter optimization: the fault missed detection can bring potential safety hazards and reliability, the fault false alarm can also increase the costs of quality assurance, operation and maintenance and the like, and different parameter setting combinations have different prediction effects, so that the fault prediction errors can cause economic losses of different degrees.

The cost function is expressed as:

E＝λ ₁ ·N _FN +λ ₁ ·N _FP wherein NFN and NFP are respectively the undetected number and false alarm number of the equipment, lambda ₁ And lambda (lambda) ₂ Is the normalized parameter regulating coefficient between the NFN and the NFP. By optimizing the step size Sw and the failure threshold FT, the cost function can be minimized, and finally, the balance between the detection rate and the false alarm rate is achieved from the viewpoint of cost:

subject to：1≤S _w ≤N _i ，

0≤λ≤1，

wherein the cost E (S _w ，T _F ) Is S _w Implicit function related to TF, G _i For the generalized likelihood ratio of device I, I is the number of test set devices, N _i Is the number of samples for device i.

And (3) experimental verification:

data acquisition and preprocessing

On a low orbit to earth-moon resonance orbit within a range of 0-20 ten thousand kilometers away from the earth, a global navigation satellite system (Global Navigation Satellite System, GNSS) weak signal navigation receiver can receive, capture and track main lobe normal signals, main lobe leakage and side lobe navigation signals emitted by GNSS navigation constellation satellites through a pair of antenna and a pair of antenna, stably acquire pseudo-range and carrier phase observables, and under the condition of meeting positioning conditions, output positioning, orbit determination, time information and high-precision timing second pulse information, thereby providing technical support for the system to realize autonomous determination of orbit and time service of a space-based object.

Due to interference or shielding, or spacecraft attitude change, the satellite receiving state of the GNSS receiver may have poor satellite signal quality, i.e. abnormal satellite receiving fault. And 3 kinds of running states such as normal star receiving, sub-health star receiving, abnormal star receiving and the like of the GNSS receiver are simulated through a ground system, and corresponding multidimensional industrial parameter data are acquired. Feature selection:

because of a large amount of redundancy in the multi-dimensional industrial parameters of the GNSS receiver, the abnormal monitoring algorithm driven by the data can be adversely affected, and 13 key industrial parameters such as 'antenna 1BDSB1I positioning star number', 'antenna 1GPSL1 lowest carrier-to-noise ratio', 'GPS 6 th satellite L1 carrier-to-noise ratio', 'BDS-2 6 th satellite B1I carrier-to-noise ratio', 'GPS-2 5 th satellite B1I carrier-to-noise ratio', 'GPS 7 th satellite L1 carrier-to-noise ratio', 'GPS 3 rd satellite L1 carrier-to-noise ratio', 'antenna 2GPSL1 lowest carrier-to-noise ratio', 'antenna 1BDSB1I highest carrier-to-noise ratio', 'GPS 4 th satellite L1 carrier-to-noise ratio', 'GPS 2 nd satellite L1 carrier-to-noise ratio', 'antenna 1BDSB1I tracking star number' are selected based on expert experience. Model evaluation:

to evaluate the predictive effect, four diagnostic performance evaluation parameters were selected, accuracy (Accuracy), precision (Precision), recall (Recall), and f1 score (f 1-score). The Accuracy index and the f1-score index focus on the overall model of the model, and are suitable for measuring the overall performance of the model. There is a trade-off relationship between Precision and Recall, focusing more on specific application scenarios.

Table 1 comparison of diagnostic performance of different diagnostic methods with the methods herein

As shown in table 1, the machine learning method kNN, bayes, RF is commonly used, and the specific results of the four Improved RF methods presented herein under different evaluation criteria. Improved RF can be seen to be significantly better than other methods in the Accuracy, recall, f-score index, and at an acceptable level in the Precision index, comparable to other algorithms. Particularly, a higher f1 fraction indicates that the method provided by the invention has better comprehensive performance in GNSS receiver anomaly detection, and has high detection rate and low false alarm rate.

By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:

the invention discloses a space effective load anomaly detection method based on an improved random forest, which comprises sample anomaly detection and equipment anomaly detection, wherein the sample anomaly detection is carried out by improving a weighted voting random forest algorithm, distributing larger weight to decision trees with higher precision, reducing the influence of decision trees with poor classification effect on results, and carrying out sample-level anomaly detection. And secondly, detecting equipment abnormality, mapping a sample layer fault early warning result to equipment level fault early warning through a sliding window method, and simultaneously optimizing a sliding window step length and a failure threshold value by considering cost penalty functions of missed judgment and misjudgment, so as to solve the problem of balancing the detection rate and the false alarm rate. The method is suitable for an on-orbit fault management scene of a space effective load with limited calculation force and resources by establishing a lightweight anomaly detection model, is beneficial to autonomously analyzing on-orbit monitoring data under the condition of lacking the support of earth ground measurement and control resources when deep space detection tasks such as moon, mars and the like are executed, rapidly identifying anomalies and positioning faults, adopting fault processing and recovery measures, assisting planning decisions and guaranteeing safe, reliable and stable operation of a spacecraft.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which is also intended to be covered by the present invention.

Claims

1. A method for detecting anomalies in a space payload based on an improved random forest, comprising the steps of:

2. The method for improved random forest based spatial payload anomaly detection of claim 1,

the random forest algorithm comprises the following steps: constructing a decision tree and integrating voting;

wherein:

the branches represent different values of the decision nodes;

the leaf nodes then represent the last diagnosed class.

3. The method for improved random forest based spatial payload anomaly detection of claim 2,

the sample abnormality detection is performed according to the following formula:

4. The method for improved random forest based spatial payload anomaly detection of claim 3,

and allocating larger weight to the decision tree according to a logic model, wherein the decision tree is specifically prepared according to the following formula:

5. The method for improved random forest based spatial payload anomaly detection of claim 4,

matthews correlation coefficient (Matthews correlation coefficient, MCC) evaluation index is introduced for better describing classification accuracy; specifically, the method is carried out according to the following formula:

6. the method for improved random forest based spatial payload anomaly detection of claim 5,

the specific method for detecting the equipment abnormality comprises the following steps:

comprising the following steps:

step 1, obtaining binary state time sequence data X of each equipment sample by improving a sample anomaly detection method of the random forest _i I=1, 2, …, T; when X is _i =1 indicates that the sample is abnormal at this time, X _i Sample is normal for =0;

P(X _i ＝1)＝p _i ，i＝t-S _w +1，...，t，

the Pulse odds-ratio at time t can be expressed as:

the above pair-wise compiling can be solved:

given a training and/or testing dataset of a device, the device's final health state Ω _i E {0,1} i=1, …, l is a priori knowledge, so the training/test set can be expressed as { (G) ₁ ，Ω ₁ )，...(G _l ，Ω _l ) Device i contains m generalized likelihood ratios G _i ＝{G _i1 ，...，G _im -a }; when the failure threshold FT is given, the generalized likelihood ratio of device i is tagged as

7. the method for improved random forest based spatial payload anomaly detection of claim 6,

the device anomaly detection further includes: optimizing cost function parameters according to the cost sensitive function; the cost function is expressed as:

E＝λ ₁ ·N _FN +λ ₁ ·N _FP

subject to：1≤S _w ≤N _i ，

0≤λ≤1，