CN117520778A - Markov blanket-based equipment assessment method and device and computer equipment - Google Patents

Markov blanket-based equipment assessment method and device and computer equipment Download PDF

Info

Publication number
CN117520778A
CN117520778A CN202311536280.3A CN202311536280A CN117520778A CN 117520778 A CN117520778 A CN 117520778A CN 202311536280 A CN202311536280 A CN 202311536280A CN 117520778 A CN117520778 A CN 117520778A
Authority
CN
China
Prior art keywords
feature
candidate
feature set
target
father
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311536280.3A
Other languages
Chinese (zh)
Inventor
孙建彬
崔瑞靖
剧伦豪
杨克巍
蒋平
葛冰峰
于海跃
吴罗福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202311536280.3A priority Critical patent/CN117520778A/en
Publication of CN117520778A publication Critical patent/CN117520778A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Operations Research (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The application relates to a Markov blanket-based equipment assessment method, a Markov blanket-based equipment assessment device and computer equipment, wherein the method comprises the following steps: dividing the assessment data set, and calculating a relative difference set according to the obtained multiple local assessment data subsets and the assessment data set respectively to obtain multiple corresponding data subsets; taking the performance index to be checked as a target variable, and determining a candidate feature set; for each data subset, respectively calculating to obtain a local Markov blanket of the target variable relative to the candidate feature set; calculating the frequency of occurrence of the specific feature in the plurality of local Markov carpets, and when the frequency meets a preset condition, putting the corresponding feature into the target Markov carpets; and finally, the obtained target Markov blanket is used as input data of a subsequent performance index assessment model, and characteristic data in the target Markov blanket can be acquired in a competing manner during subsequent assessment of the performance index, so that the data acquisition cost can be greatly reduced, and the efficiency, accuracy and stability of assessment can be ensured.

Description

Markov blanket-based equipment assessment method and device and computer equipment
Technical Field
The application relates to the technical field of equipment performance assessment, in particular to an equipment assessment method, an equipment assessment device and computer equipment based on a Markov blanket.
Background
The equipment test data acquisition is basic work for supporting equipment performance assessment work, massive equipment test evaluation data are often generated in the equipment performance assessment work, however, the cost is high in a large amount of data acquisition process, the problem that equipment performance assessment is difficult to achieve due to high dimension, redundancy, low utilization rate and the like of the test data is often caused, and the data utilization rate is generally low. That is, a large number of redundant features may reduce the effectiveness and efficiency of the assessment when the equipment is evaluated using the test data. In addition, for different performance indexes to be checked, the key characteristics are also different, so that in order to purposefully check each performance index, how to screen out the corresponding key characteristic set, the characteristic dimension of the sample data collected later can be greatly reduced, the cost of data collection is reduced, the efficiency and the utilization rate of data collection are improved, and further the accuracy and the stability of equipment performance check are improved, and the method has important significance.
Key features closely related to the performance index of the core to be checked can be selected from the features with multiple dimensions through causal feature selection. The purpose of causal feature selection is to discover a Markov Blanket (MB) of a class attribute (hereinafter referred to as a target variable or target node, in the equipment performance assessment work, the class attribute refers to an attribute corresponding to a performance index to be examined), thereby constructing a stable and interpretable model. The concept of MB originates from bayesian networks. Fig. 1 illustrates an example of an MB of a target node in a loyal bayesian network.
Under such theoretical guarantees, research into MB-based feature selection methods has been greatly developed. At present, constraint-based MB discovery algorithms have received significant attention. These algorithms use conditional independence in a bayesian network to detect the MB of a target variable. Many algorithms have been proposed by students from different angles, considering the improvement of accuracy, algorithm efficiency, trade-off between the two, etc. Notably algorithms include incremental associated Markov carpets (IAMB) and variants thereof, simultaneous Markov carpet learning algorithms (STMB), mistaking Markov carpet learning (EAMB), and the like. Score-based methods are also notable, which first define a scoring function and then search the Bayesian network space represented by the directed acyclic graph for optimal structure to retrieve MB variables.
However, current feature selection methods have limitations in processing mixed data that contains both discrete and continuous features. These methods focus primarily on a single data type, or specifically deal with continuous or discrete data. Junghye et al first noted this problem and proposed a Mixed-MB algorithm specifically designed for Mixed-type data. The algorithm constructs the mix-mb algorithm by embedding a newly introduced generalized Condition Independent (CI) test in the Inter-IAMB. The nature of the algorithm still resides in setting different CI test functions according to variable type. The inspiration of the predictive permutation feature selection (Predictive Permutation Feature Selection, PPFS) algorithm comes from the Knockoff framework, and thus predictive permutation independence (Predictive Permutation Independence, PPI) test was proposed as its basic configuration. However, these methods have the disadvantage of low data efficiency, and it is difficult to obtain stable MB variants. The significant differences observed in the found MBs are evident when applied to the same dataset with different sample sizes. Furthermore, the use of simple growth and contraction frameworks in MB learning algorithms prevents the assurance of result integrity and accuracy.
In summary, the existing method for screening the key features for the performance index to be checked from the high-dimensional data features has the problems of low efficiency and low stability, and the accuracy of the subsequent check data acquisition and equipment performance check cannot be ensured.
Disclosure of Invention
Based on the foregoing, it is necessary to provide a markov blanket-based equipment assessment method, device and computer equipment, so as to improve accuracy and stability of equipment performance assessment, and improve efficiency of subsequent equipment assessment data acquisition.
A markov blanket-based equipment assessment method, the method comprising:
acquiring an assessment data set of equipment; each sample data in the assessment dataset includes features of multiple dimensions;
dividing the assessment data set into a plurality of local assessment data sets, and obtaining a plurality of corresponding data subsets according to the relative difference sets of the assessment data sets and the local assessment data sets;
taking the performance index to be checked as a target variable, and taking the set of the features in the data subset as a candidate feature set; for each data subset, respectively obtaining a local Markov blanket of the target variable relative to the corresponding candidate feature set through a Markov blanket learning algorithm;
for each feature in the local Markov blanket, calculating the frequency of occurrence of the feature in the local Markov blanket, and when the frequency meets the preset condition, putting the corresponding feature into the target Markov blanket;
and taking the characteristics in the finally obtained target Markov blanket as key characteristics of the performance index to be checked, and inputting the key characteristics into a trained classifier to obtain a performance check result.
An equipment assessment device based on a markov blanket, the device comprising:
the equipment assessment data acquisition module is used for acquiring an assessment data set of equipment; each sample data in the assessment dataset includes features of multiple dimensions;
the assessment data set dividing module is used for dividing the assessment data set into a plurality of local assessment data sets, and obtaining a plurality of corresponding data subsets according to the relative difference sets of the assessment data sets and the local assessment data sets;
the candidate feature set dividing module is used for taking the performance index to be checked as a target variable and taking the set of features in the data subset as a candidate feature set; the method comprises the steps of carrying out a first treatment on the surface of the
The local Markov blanket learning module is used for respectively obtaining local Markov blankets of the target variable relative to the corresponding candidate feature set through a Markov blanket learning algorithm for each data subset;
the feature screening module is used for calculating the frequency of each feature in the local Markov blanket in the plurality of local Markov blankets, and when the frequency meets the preset condition, the corresponding feature is put into the target Markov blanket;
the performance index checking module is used for taking the characteristics in the finally obtained target Markov blanket as key characteristics of the performance index to be checked, and inputting the key characteristics into the trained classifier to obtain a performance checking result.
A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:
acquiring an assessment data set of equipment; each sample data in the assessment dataset includes features of multiple dimensions;
dividing the assessment data set into a plurality of local assessment data sets, and obtaining a plurality of corresponding data subsets according to the relative difference sets of the assessment data sets and the local assessment data sets;
taking the performance index to be checked as a target variable, and taking the set of the features in the data subset as a candidate feature set;
for each data subset, respectively obtaining a local Markov blanket of the target variable relative to the corresponding candidate feature set through a Markov blanket learning algorithm;
for each feature in the local Markov blanket, calculating the frequency of occurrence of the feature in the local Markov blanket, and when the frequency meets the preset condition, putting the corresponding feature into the target Markov blanket;
and taking the characteristics in the finally obtained target Markov blanket as key characteristics of the performance index to be checked, and inputting the key characteristics into a trained classifier to obtain a performance check result.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring an assessment data set of equipment; each sample data in the assessment dataset includes features of multiple dimensions;
dividing the assessment data set into a plurality of local assessment data sets, and obtaining a plurality of corresponding data subsets according to the relative difference sets of the assessment data sets and the local assessment data sets;
taking the performance index to be checked as a target variable, and taking the set of the features in the data subset as a candidate feature set;
for each data subset, respectively obtaining a local Markov blanket of the target variable relative to the corresponding candidate feature set through a Markov blanket learning algorithm;
for each feature in the local Markov blanket, calculating the frequency of occurrence of the feature in the local Markov blanket, and when the frequency meets the preset condition, putting the corresponding feature into the target Markov blanket;
and taking the characteristics in the finally obtained target Markov blanket as key characteristics of the performance index to be checked, and inputting the key characteristics into a trained classifier to obtain a performance check result.
The method, the device and the computer equipment for checking the equipment based on the Markov blanket comprise the steps of firstly, dividing an checking data set, and calculating relative difference sets according to a plurality of obtained local checking data subsets and the checking data set respectively to obtain a plurality of corresponding data subsets; then, taking the performance index to be checked as a target variable, and determining a candidate feature set; then, for each data subset, respectively calculating to obtain a local Markov blanket of the target variable about the candidate feature set; then calculating the frequency of the specific feature in the plurality of local Markov carpets, and when the frequency meets the preset condition, putting the corresponding feature into the target Markov carpets; finally, the obtained target Markov blanket can be used as input data of a subsequent performance index assessment model, and characteristic data in the target Markov blanket can be acquired in a competing manner during subsequent assessment of the performance index, so that the cost of data acquisition can be greatly reduced, and the efficiency, accuracy and stability of equipment assessment can be ensured.
Drawings
FIG. 1 is an example diagram of MB of a target node in a loyal Bayesian network;
FIG. 2 is a flow diagram of a Markov blanket-based equipment assessment method in one embodiment;
FIG. 3 is an internal block diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
In one embodiment, as shown in fig. 2, there is provided a markov blanket-based equipment assessment method comprising the steps of:
step 202, acquiring an assessment dataset of the equipment.
Wherein each sample data in the assessment dataset includes features of multiple dimensions. The features may be of a single type, i.e. only discrete data or only continuous data, or of a hybrid type, i.e. comprising both discrete and continuous features.
Taking some unmanned equipment as an example, the feature type of the assessment data set comprises basic parameter features of the equipment, and natural environment features (such as air temperature, sea state, wind wave and the like) in a task execution scene.
Step 204, dividing the assessment data set into a plurality of local assessment data sets, and obtaining a plurality of corresponding data subsets according to the relative difference between the assessment data set and each local assessment data set
Set the assessment data set asWhere n is the sample size, d is the number of candidate features, U is the candidate feature set, and T is the target vector of size n×1.
The dataset is divided into K different subsets, each subset containing the same features, except that the samples are different. For example, a data set is 1000 rows and 100 columns, i.e., the number of samples is 1000 and the number of features is 100. If it is divided into k=10 local data sets, each local data set is 100 rows and 100 columns.
Suppose that a dataset is to be examinedDivided into K partial assessment data sets, each partial assessment data set may be expressed as +.>The corresponding subset of data may be denoted +.>
Step 206, taking the performance index to be checked as a target variable, and taking the set of features in the data subset as a candidate feature set.
For example: for certain unmanned equipment, the task success rate, the reconnaissance efficiency and the like are inspected, and then the inspected task success rate and reconnaissance efficiency are target variables.
Step 208, for each subset of data, obtaining a local Markov blanket of the target variable with respect to the corresponding candidate feature set through a Markov blanket learning algorithm.
Thus, for different subsets of dataIts local markov blanket is not exactly the same.
Step 210, for each feature in the local markov blanket, calculating the frequency of occurrence of the feature in the local markov blanket, and when the frequency meets the preset condition, putting the corresponding feature in the target markov blanket.
An algorithm is performed on each subset of data to learn the Markov blanket, improving the utilization of the assessment data, as exemplified by: the assessment dataset was divided into 3 parts, the key features selected on each subset were { A, B, C, D }, { A, B, C, E }, { A, B, C, F }, meaning that the three features A, B, C all appeared most frequently 3 times, and the final key feature set selected was { A, B, C }, according to the aggregation policy.
This means that only features with frequencies above the threshold will be selected for inclusion in MB (T). A significant advantage of this approach is that it allows for the selection of stable MBs, while also allowing for control of the number of features.
And 212, taking the characteristics in the finally obtained target Markov blanket as key characteristics of the performance index to be checked, and inputting the key characteristics into a trained classifier to obtain a performance check result.
It should be understood that, although the steps in the flowchart of fig. 2 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.
In one embodiment, obtaining the local Markov blanket of the target variable with respect to the corresponding candidate feature set through a Markov blanket learning algorithm, respectively, comprises:
identifying an intermediate parent-child feature set associated with the target variable; the intermediate father-son feature set is obtained by sequentially carrying out feature growth and contraction on the candidate father-son feature set through a combination condition independence test;
acquiring a first relative difference set of a candidate feature set, a target variable and a corresponding intermediate father-son feature set, arranging candidate features in the first relative difference set in a descending order according to the degree of dependence with the target variable, selecting a plurality of candidate features according to a preset proportion, and forming a candidate feature queue;
for each candidate feature in the candidate feature queue, identifying an intermediate father-son feature set associated with the candidate feature, and if the intermediate father-son feature set associated with the candidate feature contains a target variable, adding the corresponding candidate feature into the intermediate father-son feature set associated with the target variable to obtain a target father-son feature set of the target variable;
identifying a target partner feature set of the target variable according to the target father-son feature set;
a local markov blanket of the corresponding candidate feature set is constructed from the target partner feature set and the target parent-child feature set.
In one embodiment, identifying a target partner feature set of a target variable from a target parent-child feature set includes:
for each father-son feature in the target father-son feature set, identifying an intermediate father-son feature set associated with the father-son feature, if the intermediate father-son feature set associated with the father-son feature set contains a target variable, adding the corresponding father-son feature into the intermediate father-son feature set associated with the target variable to obtain an intermediate father-son feature set related to the father-son feature, and obtaining a candidate father-son feature union set of each father-son feature;
calculating a third relative difference set between the intermediate spouse feature union set and the target father-son feature set as well as the target variable to obtain a candidate spouse feature set;
and for each candidate partner feature in the candidate partner feature set, if the candidate partner feature is dependent on the condition of the target variable in the target father-son feature set of the target variable, adding the corresponding candidate partner feature into the partner feature set, and finally obtaining the target partner feature set of the target variable.
In one embodiment, identifying the intermediate set of parent-child features associated with the target variable includes:
in the current iteration, if the candidate features in the candidate feature set are independent of the target variable under the condition of the candidate father-son feature set of the target variable, deleting the corresponding candidate features from the candidate feature subset to obtain a new candidate feature set;
selecting the associated feature with the greatest dependence on the target variable from the new candidate feature set, adding the associated feature into the candidate parent-child feature set to obtain the candidate parent-child feature set output by the current iteration if the associated feature and the target variable are dependent under the condition of the candidate parent-child feature set of the target variable, and deleting the associated feature from the new candidate feature set to obtain the candidate feature set output by the current iteration;
after finishing iteration, obtaining an initial father-son feature set;
and performing feature shrinkage according to the initial father-son feature set by combining the independent test of the conditions to obtain an intermediate father-son feature set.
In one embodiment, feature contraction is performed according to an initial parent-child feature set in combination with a conditional independent test to obtain an intermediate parent-child feature set, comprising: if the candidate father-son characteristics in the initial father-son characteristics set and the target variable are independent under the corresponding second relative difference set, deleting the corresponding candidate father-son characteristics from the initial father-son characteristics set to obtain an intermediate father-son characteristics set; the corresponding second relative difference set is the relative difference set of the initial parent-child feature set and the corresponding candidate parent-child feature set.
As shown in table 1, an algorithmic flow is provided that seeks a local markov blanket.
Table 1 algorithm flow for seeking local markov carpets
Where line 1 PC (T) represents the intermediate parent-child feature set associated with the target variable and line 7 PC (T) represents the target parent-child feature set associated with the target variable.
As shown in table 2, an algorithm flow is provided that seeks an intermediate set of parent-child features.
Table 2 algorithm flow for seeking intermediate parent-child feature sets
In the first step, in algorithm 2, the dep (·) function (line 6) is used to calculate the dependency strength between the variables by the method of CI testing (conditional independence testing), the purpose of which is to identify the node with the greatest dependency on the target variable T. The candidate feature space is continually compressed during the iteration process to avoid redundant CI testing (lines 4 and 9) as much as possible.
Step 1 may produce some false positive features in PC (T), which may lead to erroneous causal feature selection. To solve this problem, algorithm 2 introduces a second step.
Second, in algorithm 2, this step aims to detect and eliminate false positives through additional CI testing within the PC (T). Nevertheless, in section 1, a mechanism is still needed to recall PC nodes that may be missed due to the pcmask phenomenon. In this application, nodes, features, and variables have the same meaning. Definition (PCMasking): order theAnd is also provided withIf T S 1 |S 2 And T S 2 |S 1 S is then 1 And S is 2 PCMasking, referred to as T. The appearance of PCMasking can be attributed to the difference between the distribution of the sample data and the true distribution.
To mitigate the effects of PCMasking, algorithm 1 contains 2-9 rows that are specifically designed to reduce the effects of condition dependencies that are not present in the real data distribution. Note that it is relatively rare to miss the PC node due to pcmapping. Variables that show strong dependencies on T are more likely to be identified as PC nodes. Therefore, it is not necessary to process all variables again, only those variables highly correlated to T (ratio k) need be included in candidate queue Q. Wherein k is an adjustable parameter, k is [0,1].
Section II (lines 10-22 in Algorithm 1) is created specifically to supplement the SP variable of MB (T). Typically, it contains candidate set generation policies for SP variables (lines 10-16) and policies for identifying SPs (lines 17-21). These strategies are driven by the following definitions and propositions.
Definition (AND rule): given bayesian networkAnd X i ,X j E U. If X i ∈PC(X i ) And X j ∈PC(X j ) At the same time establish X i Is X j Is a parent node (or child node). If only one condition is met, it is defined as an OR rule.
Definition (representation set): given τ, then about X i T representation set of (2)Is->
The AND rules provide strict constraints that prevent the introduction of redundant representation sets. More precisely, forAnd T.epsilon.PC (Y) i ) Reference Y i The representation set of (2) can be represented as +.>
Proposition: given PC (T) and SP (T), the representation set of T is calculated asThen there is
And (3) proving: assume thatAccording to the definition of SP, then->So that Y ε PC (Z) and Y ε PC (T). According to the AND rule we get Z ε PC (Y). According to the definition of the representation set->Hold true->Such asThen->
The candidate set CanSP for SPs is thus generated from the proposition described above (line 16). Then, the collision structure detection process (lines 17 to 21) is performed to obtain SP (T). The CAMB algorithm finally yields MB (T) =pc (T)/() us SP (T).
In one embodiment, for each feature within a local Markov blanket, the frequencies of its occurrence within a plurality of local Markov blankets are calculated, and when the frequencies meet a preset condition, the corresponding feature is placed into the target Markov blanket as:
wherein T represents the target variable, MB (T) represents the Markov blanket of the target variable, MB k (T) a kth local Markov blanket representing the target variable, K representing the local MarkovThe total number of carpets, i.e. the total number of data subsets, freq (X) represents the frequency of occurrence of feature X within the local markov carpet, q represents the frequency threshold.
In one embodiment, when each sample data in the assessment dataset includes both continuous and discrete features, the conditional independence test is replaced with a predictive permutation independence test. The basic idea of PPI testing is to reconstruct the CI test as a supervised learning problem. The current feature X ε U and the condition set Z ε U are combined into X= { X, Z }. The PPI assay first produced B copiesMarked as->Wherein B is an adjustable super parameter. For each->Performing random training-test segmentation to form training set +.>And test set->Therefore, we have a corresponding +.>And->Is provided with->Is shown inThe loss function defined above is about the predicted output +.>And a true result T. Then we are +.>There is an empirical risk:
wherein f i Representing the correspondingAnd->The last one trained a well-trained supervised learning model. In order to determine the importance of the current feature X, it is necessary to add it to its imitation ++>A comparison is made. />Is generated by permuting X, which means scrambling the data. In this case, the empirical risk is:
wherein the method comprises the steps ofIn summary, the original assumption of CI test can be alternately expressed as:
PPI uses the Wilcoxon sign-rank test to verify the original hypothesis. If the p-value is less than a given significance level, we reject the original hypothesis.
As shown in table 3, an algorithm flow is provided that divides the assessment dataset into a plurality of local assessment datasets, further obtains data subsets and separately obtains local markov carpets.
The supervised learning model effectively enables PPI testing to process mixed data. The CAMB algorithm (i.e., the algorithms in tables 1 and 2) in combination with the PPI test can identify MBs from the blended data. This combination allows efficient exploration of the associations and dependencies of data, regardless of its hybrid nature. However, the presence of PCMasking tends to result in a decrease in the sample efficiency found by many MBs. Inefficient samples can be difficult to accurately represent the true distribution of the raw data. Furthermore, when the data violates the loyalty assumption, the MB of the target variable may become non-unique. To improve sample data efficiency and obtain the only MB of the target variable, we designed a CAMB-based hybrid data stabilization feature selection algorithm, called SM-CAMB (see algorithm 3). Furthermore, we also provide accuracy in using MB prediction in downstream tasks (e.g., classification).
Table 3 shows the algorithm flow for dividing the assessment dataset and finding the local Markov blanket
In a particular downstream task, a newly constructed datasetOnly MB features and target variables are used as inputs. To evaluate the accuracy of the model trained on MB features, K-fold cross-validation (lines 6-11 in algorithm 3) was used.
In one embodiment, as shown, there is provided a markov blanket-based equipment assessment device comprising: the device comprises an equipment assessment data acquisition module, a feature division module, a candidate feature set division module, a local Markov blanket learning module, a feature screening module and a performance index assessment module, wherein:
the equipment assessment data acquisition module is used for acquiring an assessment data set of equipment; each sample data in the assessment dataset includes features of multiple dimensions;
the assessment data set dividing module is used for dividing the assessment data set into a plurality of local assessment data sets, and obtaining a plurality of corresponding data subsets according to the relative difference between the assessment data set and each local assessment data set;
the candidate feature set dividing module is used for taking the performance index to be checked as a target variable and taking the set of features in the data subset as a candidate feature set;
the local Markov blanket learning module is used for respectively obtaining local Markov blankets of the target variable relative to the corresponding candidate feature set through a Markov blanket learning algorithm for each data subset;
the feature screening module is used for calculating the frequency of each feature in the local Markov blanket in the plurality of local Markov blankets, and when the frequency meets the preset condition, the corresponding feature is put into the target Markov blanket;
the performance index checking module is used for taking the characteristics in the finally obtained target Markov blanket as key characteristics of the performance index to be checked, and inputting the key characteristics into the trained classifier to obtain a performance checking result.
For specific limitations regarding the markov carpet-based equipment review device, reference may be made to the limitations of the markov carpet-based equipment review method hereinabove, and no further description is given herein. The various modules in the markov blanket-based equipment assessment device described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 3. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a markov carpet-based equipment assessment method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in fig. 3 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In an embodiment a computer device is provided comprising a memory storing a computer program and a processor implementing the steps of the method of the above embodiments when the computer program is executed.
In one embodiment, a computer readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method of the above embodiments.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (10)

1. A markov blanket-based equipment assessment method, the method comprising:
acquiring an assessment data set of equipment; each sample data in the assessment data set contains characteristics of multiple dimensions;
dividing the assessment data set into a plurality of local assessment data sets, and obtaining a plurality of corresponding data subsets according to the relative difference sets of the assessment data sets and the local assessment data sets;
taking the performance index to be checked as a target variable, and taking the set of the characteristics in the data subset as a candidate characteristic set;
for each data subset, respectively obtaining a local Markov blanket of the target variable relative to the corresponding candidate feature set through a Markov blanket learning algorithm;
for each feature in the local Markov blanket, calculating the frequency of occurrence of the feature in the local Markov blanket, and when the frequency meets the preset condition, putting the corresponding feature into the target Markov blanket;
and taking the characteristics in the finally obtained target Markov blanket as key characteristics of the performance index to be checked, and inputting the key characteristics into a trained classifier to obtain a performance check result.
2. The method of claim 1, wherein for each feature within a local markov blanket, calculating a frequency of its occurrence within a plurality of local markov blankets, and when the frequency satisfies a preset condition, placing the corresponding feature into a target markov blanket is:
wherein T represents the target variable, MB (T) represents the Markov blanket of the target variable, MB k (T) represents the kth local markov blanket of the target variable, K represents the total number of local markov blankets, i.e. the total number of data subsets, freq (X) represents the frequency of occurrence of feature X within the local markov blanket, q represents the frequency threshold.
3. The method according to claim 1, wherein obtaining the local markov carpet for the target variable with respect to the corresponding candidate feature set by a markov carpet learning algorithm, respectively, comprises:
identifying an intermediate set of parent-child features associated with the target variable; the intermediate father-son feature set is obtained by sequentially carrying out feature growth and contraction on the candidate father-son feature set through a combination condition independence test;
acquiring a first relative difference set of the candidate feature set, the target variable and the corresponding intermediate father-son feature set, arranging candidate features in the first relative difference set in a descending order according to the degree of dependence with the target variable, and selecting a plurality of candidate features according to a preset proportion to form a candidate feature queue;
for each candidate feature in the candidate feature queue, identifying an intermediate parent-child feature set associated with the candidate feature, and if the intermediate parent-child feature set associated with the candidate feature contains the target variable, adding the corresponding candidate feature into the intermediate parent-child feature set associated with the target variable to obtain a target parent-child feature set of the target variable;
identifying a target partner feature set of the target variable according to the target parent-child feature set;
a local markov blanket of the corresponding candidate feature set is constructed from the target mate feature set and the target father-child feature set.
4. The method of claim 3, wherein identifying an intermediate set of parent-child features associated with the target variable comprises:
in the current iteration, if the candidate features in the candidate feature set are independent of the target variable under the condition of the candidate father-son feature set of the target variable, deleting the corresponding candidate features from the candidate feature set to obtain a new candidate feature set;
selecting the associated feature with the greatest dependence degree on the target variable from the new candidate feature set, if the associated feature and the target variable are dependent on the condition of the candidate parent-child feature set of the target variable, adding the associated feature into the candidate parent-child feature set to obtain a candidate parent-child feature set output by the current iteration, and deleting the associated feature from the new candidate feature set to obtain a candidate feature set output by the current iteration;
after finishing iteration, obtaining an initial father-son feature set;
and performing feature shrinkage according to the initial father-son feature set by combining the independent test of the conditions to obtain an intermediate father-son feature set.
5. The method of claim 4, wherein performing feature shrinkage from the initial parent-child feature set in combination with a conditional independent test to obtain an intermediate parent-child feature set, comprises:
if the candidate father-son features in the initial father-son feature set and the target variable are independent under the corresponding second relative difference set, deleting the corresponding candidate father-son features from the initial father-son feature set to obtain an intermediate father-son feature set; the respective second relative difference set is a relative difference set of the initial parent-child feature set and the corresponding candidate parent-child feature.
6. A method according to claim 3, wherein identifying a target partner feature set of the target variable from the target parent-child feature set comprises:
for each father-son feature in the target father-son feature set, identifying an intermediate father-son feature set associated with the father-son feature, adding the corresponding father-son feature into the intermediate father-son feature set associated with the target variable if the intermediate father-son feature set associated with the father-son feature set contains the target variable, obtaining an intermediate spouse feature set related to the father-son feature, and obtaining a candidate spouse feature union of each father-son feature;
calculating a third relative difference set between the intermediate spouse feature union set and the target father-son feature set and the target variable to obtain a candidate spouse feature set;
and for each candidate partner feature in the candidate partner feature set, if the candidate partner feature is dependent on the target variable in the target father-son feature set condition of the target variable, adding the corresponding candidate partner feature into the partner feature set, and finally obtaining the target partner feature set of the target variable.
7. A method according to claim 3, wherein the conditional independence test is replaced with a predictive permutation independence test when each sample data set in the assessment data set includes both continuous and discrete features.
8. A markov blanket-based equipment assessment device, the device comprising:
the equipment assessment data acquisition module is used for acquiring an assessment data set of equipment; each sample data in the assessment data set includes a plurality of dimensional features;
the assessment data set dividing module is used for dividing the assessment data set into a plurality of local assessment data sets, and obtaining a plurality of corresponding data subsets according to the relative difference between the assessment data set and each local assessment data set;
the candidate feature set dividing module is used for taking the performance index to be checked as a target variable and taking the set of features in the data subset as a candidate feature set;
the local Markov blanket learning module is used for respectively obtaining local Markov blankets of the target variable relative to the corresponding candidate feature set through a Markov blanket learning algorithm for each data subset;
the feature screening module is used for calculating the frequency of each feature in the local Markov blanket in the plurality of local Markov blankets, and when the frequency meets the preset condition, the corresponding feature is put into the target Markov blanket;
and the performance index checking module is used for taking the characteristics in the finally obtained target Markov blanket as key characteristics of the performance index to be checked, and inputting the key characteristics into a trained classifier to obtain a performance checking result.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
CN202311536280.3A 2023-11-16 2023-11-16 Markov blanket-based equipment assessment method and device and computer equipment Pending CN117520778A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311536280.3A CN117520778A (en) 2023-11-16 2023-11-16 Markov blanket-based equipment assessment method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311536280.3A CN117520778A (en) 2023-11-16 2023-11-16 Markov blanket-based equipment assessment method and device and computer equipment

Publications (1)

Publication Number Publication Date
CN117520778A true CN117520778A (en) 2024-02-06

Family

ID=89743403

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311536280.3A Pending CN117520778A (en) 2023-11-16 2023-11-16 Markov blanket-based equipment assessment method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN117520778A (en)

Similar Documents

Publication Publication Date Title
Cranmer et al. What can we learn from predictive modeling?
Magliacane et al. Ancestral causal inference
Jiang et al. Predicting protein function by multi-label correlated semi-supervised learning
Büyükçakir et al. A novel online stacked ensemble for multi-label stream classification
CN112148892B (en) Knowledge completion method and device for dynamic knowledge graph and computer equipment
Castro et al. Significant motifs in time series
Schäler et al. QuEval: Beyond high-dimensional indexing à la carte
de Araujo et al. Tensorcast: Forecasting with context using coupled tensors (best paper award)
Joblin et al. How do successful and failed projects differ? a socio-technical analysis
CN111881447A (en) Intelligent evidence obtaining method and system for malicious code fragments
Wankhade et al. Data stream classification: a review
Chuang et al. Cortx: Contrastive framework for real-time explanation
Ren et al. Integrated defense for resilient graph matching
Li et al. M3gan: A masking strategy with a mutable filter for multidimensional anomaly detection
Dendek et al. Evaluation of features for author name disambiguation using linear support vector machines
Raut et al. A comparative study of classification algorithms for link prediction
CN116030312B (en) Model evaluation method, device, computer equipment and storage medium
Zhang et al. Software defect prediction based on stacked contractive autoencoder and multi-objective optimization
CN117520778A (en) Markov blanket-based equipment assessment method and device and computer equipment
Qiang Research on software vulnerability detection method based on improved CNN model
CN111694737B (en) Container influence measuring method and device based on mirror image dependency network
Rashid et al. A top down approach to enumerate α-maximal cliques in uncertain graphs
Huang et al. Community detection algorithm for social network based on node intimacy and graph embedding model
Xin et al. Prediction of the buckling mode of cylindrical composite shells with imperfections using FEM-based deep learning approach
Dave Adversarial privacy auditing of synthetically generated data produced by large language models using the tapas toolbox

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination