CN114492007A - Factor effect online identification method and device based on hierarchical error control - Google Patents

Factor effect online identification method and device based on hierarchical error control Download PDF

Info

Publication number
CN114492007A
CN114492007A CN202210047863.9A CN202210047863A CN114492007A CN 114492007 A CN114492007 A CN 114492007A CN 202210047863 A CN202210047863 A CN 202210047863A CN 114492007 A CN114492007 A CN 114492007A
Authority
CN
China
Prior art keywords
effect
factor
test
interaction
main
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210047863.9A
Other languages
Chinese (zh)
Inventor
施文
谢翔
陈晓红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202210047863.9A priority Critical patent/CN114492007A/en
Publication of CN114492007A publication Critical patent/CN114492007A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a factor effect online identification method and device based on hierarchical error control, wherein the method comprises the following steps: initializing parameters of main/interactive effect identification; identifying variation factors in the simulation model driven by the online data in real time, and constructing a real-time design matrix; calculating the base effect of the variation factor from the output of the simulation model; converting the base effect into a new sample for main/interactive effect hypothesis test, sampling the two effects from the new sample to obtain Bootstrap samples, and repeating the steps to obtain B Bootstrap samples; calculating the test statistic of each Bootstrap sample, and calculating the p value of the main/interactive effect test; and comparing the p value of the main/interactive effect test with the respective test level to determine whether the variable factor has important main/interactive effect, and completing the identification of the variable factor by any observation point. The invention can identify the importance of factor effect on line and control the error of the judgment result.

Description

Factor effect online identification method and device based on hierarchical error control
Technical Field
The invention belongs to the field of simulation modeling and analysis, and particularly relates to a factor effect online identification method and device based on hierarchical error control.
Background
In the industrial 4.0 background, technologies such as internet of things, business intelligence, 5G and cloud computing are integrated into large-scale applications, thereby realizing real-time data transmission, processing and feedback. The digital twin is the most advanced digital research paradigm at present as a system modeling method based on real-time data and models. The digital twin builds a digital world through simulation modeling, and dynamically updates simulation model parameters through real-time data; one of the characteristics of the digital world is control, the dynamic state which is difficult to detect under different time dimensions is found based on a digital-analog linkage simulation mode, and then action instructions are transmitted to realize the value increment of physical entities. Such online data-driven simulations require more powerful tools to resolve the system internal complexity and observe its dynamic characteristics to support real-time decisions.
Factor effect identification/sensitivity analysis is an important technical means for describing potential input/output relationships of a simulation system, and can be combined with experimental design, sampling and statistical inference to identify factors (inputs) having significant influence on system output. Of these, the sequential branching method and the Morris-based effect method are generally considered to be the most effective two factor effect recognition methods. However, most of the traditional factor effect recognition research focuses on an offline simulation model under static data, and is limited to controlling the probability of two types of errors in one global sensitivity analysis. In the online data-driven simulation model, the value range of the factor may be changed continuously, and if the production capacity is reduced due to equipment failure in an intelligent manufacturing shop, the classification of the factor on the original importance of the system performance may also be changed correspondingly, that is, the significance of the main effect and the interactive (nonlinear) effect of the factor is changed. If the off-line factor effect identification method is simply applied to on-line setting, the efficiency and effectiveness requirements of real-time simulation experiment analysis are difficult to meet.
With the wide attention paid to online real-time decision in recent years, the rise of real-time error control research provides a solution for the invention. The most advanced real-time error control methods at present include monotonous generalized alpha investment rules, discarding-spending algorithms, etc., but these methods are all based on independence assumptions at different time points, and do not consider the situation of checking multiple assumptions with correlation (such as main effect assumption and interactive effect assumption considered in factor effect real-time identification) at the same time point. Therefore, there is a need to develop an adaptive online method to identify the important effects of the varying factors and control the overall error level in real time.
Disclosure of Invention
Aiming at the defects of the online data-driven simulation analysis technology under the current digital twin background, the invention provides a factor effect online identification method and device based on hierarchical error control.
In order to achieve the technical purpose, the invention adopts the following technical scheme:
a factor effect online identification method based on hierarchical error control comprises the following steps:
step 1, initializing parameters of main effect recognition and interactive effect recognition according to the FDR level eta of the whole factor effect recognition;
step 2, aiming at k factors existing in the online data-driven simulation model, acquiring a single factor which changes in the value range of any observation point t in the model, and recording the single factor as a change factor ltFurther constructing a real-time design matrix; wherein the variation factor lt∈{1,2,…,k};
Step 3, inputting the real-time design matrix into a simulation model to obtain simulation output, and calculating the base effect of the variation factor;
and 4, converting and obtaining a new base effect sample for main effect hypothesis test and a new base effect sample for interactive effect test according to the original variation factor base effect sample:
step 5, independently sampling the new basic effect samples subjected to the main effect hypothesis test in a returning mode for N times to obtain Bootstrap samples subjected to the main effect hypothesis test, and independently sampling the new basic effect samples subjected to the interactive effect test in a returning mode for N times to obtain Bootstrap samples subjected to the interactive effect hypothesis test;
repeating the step 5 for B times to respectively obtain B Bootstrap samples for main effect hypothesis test and B Bootstrap samples for interactive effect hypothesis test;
step 6, respectively counting Bootstrap samples of each main effect hypothesis test and Bootstrap samples of each interactive effect hypothesis test to obtain corresponding test statistics;
step 7, by checking the principle effect hypothesisComparing Bootstrap sample statistic with primordial effect sample main effect statistic, and calculating p value of main effect test
Figure BDA0003473095250000021
And calculating a p-value for the interaction effect test by comparing the Bootstrap sample statistic of the interaction effect hypothesis test with the interaction effect statistic of the primordial effect sample
Figure BDA0003473095250000022
Step 8, determining the observation point of the most recent recognition of the important main effect
Figure BDA0003473095250000023
Observation points of important interaction effects
Figure BDA0003473095250000024
Obtaining the check level of the main effect layer of the observation point t by using the LORD1 rule
Figure BDA0003473095250000025
And the level of examination of the interaction effect layer
Figure BDA0003473095250000026
Step 9, judging the p value of the main effect layer
Figure BDA0003473095250000027
Greater than the level of examination of the layer
Figure BDA0003473095250000028
If yes, determining the variation factor ltHas no important main effect, otherwise, the variation factor l is determinedtHas important main effects; and judging the p value of the interactive effect layer
Figure BDA0003473095250000029
Greater than the level of examination of the layer
Figure BDA00034730952500000210
If yes, determining the variation factor ltHas no important interaction effect, otherwise, the variation factor l is judgedtHas important interaction effect; completing the random observation point t to the variation factor ltThe identification of (2).
Further, the initializing parameters of the main effect recognition and the interactive effect recognition specifically includes:
step 1.1, according to the FDR level eta identified by the overall factor effect, the FDR level identified by the main effect is distributed
Figure BDA0003473095250000031
And interaction recognition FDR levels
Figure BDA0003473095250000032
And satisfy
Figure BDA0003473095250000033
Step 1.2, the maximum test level at which the initial observation point t is 1 is identified for the main effect
Figure BDA0003473095250000034
And defining a maximum increment of the check level in the recognition of the main effect
Figure BDA0003473095250000035
Maximum test level for interaction effect recognition when initial observation point t is 1
Figure BDA0003473095250000036
And defining maximum increment of check level in interactive effect recognition
Figure BDA0003473095250000037
Further, the method for constructing the real-time design matrix in the step 2 comprises the following steps: obtaining a variation factor ltN sampling matrix random form
Figure BDA0003473095250000038
Then longitudinally splicing to construct a real-time design matrix
Figure BDA0003473095250000039
The sampling matrix is in random form
Figure BDA00034730952500000310
Is a2 xk dimensional matrix in which a behavior factor is combined
Figure BDA00034730952500000311
Another combination of behavior factors
Figure BDA00034730952500000312
Δ is a combination of two line factors
Figure BDA00034730952500000313
The difference in the above.
Further, each sampling matrix is in a random form
Figure BDA00034730952500000314
The acquisition process comprises the following steps:
step a1, constructing a (k +1) × k-dimensional sampling matrix B composed of 0 s and 1 s such that the j +1 th row differs from the 1 st row by 1 in the j column, defining the sampling matrix B as follows:
Figure BDA00034730952500000315
step A2, converting the sampling matrix B into the ith random form required by the observation point t
Figure BDA00034730952500000316
So that each factor j epsilon {1, 2, …, k } can only be within the value range of [ -1, 1 [ ]]Is taken at p discrete positions, i.e. xjE { -1, -1+2/(p-1), …, 1}, and defining the conversion formula as follows:
Figure BDA00034730952500000317
wherein D is*Is a diagonal matrix of k dimensions, the diagonal elements of which are 1 or-1; x is the number of*Each factor x injThe value range of (a) is { -1, -1+2/(p-1), …, 1-delta }; j. the design is a squarek+1,kA (k +1) × k-dimensional matrix in which all elements are 1; p*The random permutation matrix is k multiplied by k, only one element in each row and each column of the random permutation matrix is 1, and the rest elements are 0;
step A3, selecting
Figure BDA00034730952500000318
Line 0 in (1) and the variation factor ltThe corresponding rows form a new matrix to obtain a random form of a real-time sampling matrix
Figure BDA0003473095250000041
Wherein the matrix is only in the ltThe columns differ by a.
Further, the step 3 specifically includes: will design the matrix in real time
Figure BDA00034730952500000431
The values of the factors are reversely normalized to a real range, the factor combinations are input into a simulation model line by line to obtain corresponding simulation output, and finally different pairs of factor combinations are calculated
Figure BDA0003473095250000042
And
Figure BDA0003473095250000043
with respect to the variation factor ltThe radical effect of (1); ith radical effect
Figure BDA0003473095250000044
The calculation formula of (c):
Figure BDA0003473095250000045
wherein the content of the first and second substances,
Figure BDA0003473095250000046
and
Figure BDA0003473095250000047
respectively in random form of sampling matrix
Figure BDA0003473095250000048
The 0 th line and the variation factor line of (c),
Figure BDA0003473095250000049
Figure BDA00034730952500000410
are respectively a combination of factors
Figure BDA00034730952500000411
And
Figure BDA00034730952500000412
simulation output;
line 0, in
Figure BDA00034730952500000413
With each other row differing by a in only one column.
Further, the specific process of step 4 is as follows:
step 4.1, setting a factor main effect threshold value deltaMEAnd interaction threshold ΔIESetting the dominant effect hypothesis test
Figure BDA00034730952500000414
Interactive effect hypothesis testing
Figure BDA00034730952500000415
Step 4.2, calculating N original base effect samples
Figure BDA00034730952500000416
Sample mean of
Figure BDA00034730952500000417
And standard deviation of sample
Figure BDA00034730952500000418
The calculation formula is as follows:
Figure BDA00034730952500000419
Figure BDA00034730952500000420
step 4.3, respectively enabling the ith to be less than or equal to N original base effect samples
Figure BDA00034730952500000421
Converting the data into samples required by the hypothesis test of the main effect according to the following conversion formula
Figure BDA00034730952500000422
And samples required for hypothesis testing of interaction effects
Figure BDA00034730952500000423
Figure BDA00034730952500000424
Figure BDA00034730952500000425
Further, the calculation method for obtaining the test statistic by counting each Bootstrap sample in the step 6 is as follows:
step 6.1, calculating Bootstrap samples of each main effect hypothesis test
Figure BDA00034730952500000426
Sample mean of
Figure BDA00034730952500000427
Computing Bootstrap samples for each interaction hypothesis test
Figure BDA00034730952500000428
Sample mean of
Figure BDA00034730952500000429
And standard deviation of sample
Figure BDA00034730952500000430
Figure BDA0003473095250000051
Figure BDA0003473095250000052
Figure BDA0003473095250000053
Wherein B ∈ {1, …, B }, different Bootstrap samples for distinguishing the dominant effect hypothesis test, and different Bootstrap samples for distinguishing the interactive effect hypothesis test;
step 6.2, calculating Bootstrap samples of each main effect hypothesis test
Figure BDA0003473095250000054
Test statistic of
Figure BDA0003473095250000055
And calculating Bootstrap samples for each interaction effect hypothesis test
Figure BDA0003473095250000056
Test statistic of
Figure BDA0003473095250000057
Figure BDA0003473095250000058
Figure BDA0003473095250000059
In the formula,. DELTA.IEIs the threshold value of the interaction effect of the factor.
Further, the p value calculation method of the main effect test and the interactive effect test is as follows:
step 7.1: computing dominant effect statistics of primordial effect samples
Figure BDA00034730952500000510
And interaction effect statistics
Figure BDA00034730952500000511
Figure BDA00034730952500000512
Figure BDA00034730952500000513
In the formula (I), the compound is shown in the specification,
Figure BDA00034730952500000514
and
Figure BDA00034730952500000515
the sample mean and the sample standard deviation of the N original base effect samples are respectively.
Step 7.2: calculating the variation factor ltHypothesis testing for p-value
Figure BDA00034730952500000516
Checking p-value of sum interaction hypothesis
Figure BDA00034730952500000517
Figure BDA00034730952500000518
Figure BDA00034730952500000519
Figure BDA00034730952500000520
Bootstrap samples representing the b-th hypothesis test for major effects
Figure BDA00034730952500000521
The test statistic of (a) is,
Figure BDA00034730952500000522
bootstrap samples examined for the b-th cross-effect hypothesis
Figure BDA00034730952500000523
The test statistic of (1).
Further, the step 8 specifically includes:
step 8.1, generating a monotonically non-increasing non-negative sequence
Figure BDA00034730952500000524
So that
Figure BDA00034730952500000525
Step 8.2, determining the observation point which has identified the important main effect for the last time in the first t-1 observation points
Figure BDA00034730952500000526
If no significant primary effect has been identified in the past, then order
Figure BDA00034730952500000527
Calculating the inspection level of the main effect layer of the observation point t
Figure BDA00034730952500000528
Figure BDA0003473095250000061
Step 8.3, determining the observation point which has identified the important interaction effect for the last time in the first t-1 observation points
Figure BDA0003473095250000062
If no significant interaction effect has been identified in the past, then order
Figure BDA0003473095250000063
Then calculating the inspection level of the t interaction effect layer of the observation point
Figure BDA0003473095250000064
Figure BDA0003473095250000065
An online identification device for factor effect based on hierarchical error control comprises a memory and a processor, wherein the memory stores a computer program, and the computer program is executed by the processor to enable the processor to realize the online identification method for factor effect based on hierarchical error control according to any one of the above technical solutions.
Compared with the prior art, the invention has the beneficial effects that: the invention provides a factor effect online identification method based on hierarchical error control, which comprises the steps of obtaining a base effect sample of a variable factor by constructing a real-time design matrix and inputting the real-time design matrix into a simulation model when the value range of a single factor of a dynamic simulation model is observed to change in real time, converting the base effect sample into a Bootstrap sample and comparing the Bootstrap sample with an original base effect sample, calculating to obtain a p value for hypothesis test of a main effect and an interactive (nonlinear) effect, and obtaining a p value based on historical factor effectThe recognition result assigns a test level to the dominant and interactive (non-linear) hypothesis tests, thereby controlling the overall error level while determining the importance of the factor dominant and interactive (non-linear) effects; the method simplifies the design matrix of the Morris-based effect method so as to be suitable for single variable factor effect identification, and can improve the actual economy from k/(k +1) to 1/2; compared with the Morris-based effect method, the method uses the expression as
Figure BDA0003473095250000066
Two direct recognition factors, the method of the invention uses factor principal effect and interaction effect threshold deltaMEAnd ΔIEThe method has more theoretical basis and explanation; for hypothesis testing of main effects and interactive effects, the method is combined with a non-parametric Bootstrap hypothesis testing method, hypothesis base effect samples do not need to obey established distribution, and unbiased results can be obtained; for the dependency relationship between the p values of the main effect layer and the interactive effect layer, the method adopts a layered structure and combines a LORD1 method to control the error level of factor effect recognition in the whole real-time process, which is beneficial to meeting different preferences of different decision makers on main effect and interactive effect recognition, and recognizing important influence factors of a simulation system in real time, thereby facilitating government and enterprise departments to improve key links in time so as to improve the system performance.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a flow chart of a complaint-recall vehicle in accordance with an embodiment of the present invention;
FIG. 3 is a graph illustrating the variation of parameters observed in 2020 in accordance with the present invention;
FIG. 4 is a graph of a low and high level setting of the variation factor in an embodiment of the present invention;
FIG. 5 is a diagram illustrating an identification result of significant effects of a variation factor according to an embodiment of the present invention.
Detailed Description
The following describes embodiments of the present invention in detail, which are developed based on the technical solutions of the present invention, and give detailed implementation manners and specific operation procedures to further explain the technical solutions of the present invention.
Aiming at the factors with the value range changing at any time point, the method identifies the obvious main effect and the interactive (nonlinear) effect (the interactive effect is abbreviated as the interactive effect) of the changed factors in real time. The method includes the steps that an experimental design matrix in a Morris base effect method is simplified, the base effect of a current variation factor is obtained through real-time sampling, and the mean value and the standard deviation of a base effect sample are estimated; because the distribution of the basal effect sample is unknown, adopting hypothesis test based on non-parametric Bootstrap to obtain the p value of the current variation factor about the main effect and the interactive effect hypothesis test; and dividing the p values of the main effect and the interactive effect into two independent layers for processing, and respectively obtaining corresponding significance levels based on historical identification results of different effect layers so as to judge whether the main effect and the interactive effect of the current variation factor are significant. The invention solves the problem of real-time hypothesis testing that the real-time arriving main effect and the interactive effect hypothesis are not independent by adopting the hierarchical structure to identify the importance of the factor effect on line and controlling the error of the judgment result.
The following detailed description of the present patent will be made in conjunction with the accompanying drawings and examples.
The embodiment of the invention provides a factor effect online identification method based on hierarchical error control, which is realized by the specific steps shown in figure 1. In the embodiment, an online automobile recall simulation model is constructed, and the specific flow from customer complaints to recall/non-recall based on the model is shown in fig. 2; the embodiment also uses the published data of the automobile recall in the middle of 2019 in 2015, performs initialization setting on 29 parameters in the model, uses the Chinese automobile recall data in 2020 as a real-time input to observe the dynamic change of the parameters, and pays attention to the real-time influence of the changed parameters on the system performance (the average flow time of complaints in the system), wherein fig. 3 shows the change of 10 observation point parameters in 2020. The specific implementation mode comprises the following steps:
step 1, initializing parameters of main effect recognition and interactive effect recognition according to the FDR level eta of the overall factor effect recognition; the method specifically comprises the following steps:
step 1.1, setting the FDR level eta of the whole factor effect recognition as 0.05, and distributing the FDR level of the main effect recognition
Figure BDA0003473095250000071
And interaction recognition FDR levels
Figure BDA0003473095250000072
0.025 and 0.025 respectively;
step 1.2, the maximum test level at which the initial observation point t is 1 is identified for the main effect
Figure BDA0003473095250000073
And defining a maximum increment of the verify level in the recognition of the main effect
Figure BDA0003473095250000074
Maximum test level for interaction effect recognition when initial observation point t is 1
Figure BDA0003473095250000075
And defining a maximum increment of the verify level in the recognition of the main effect
Figure BDA0003473095250000076
Step 2, aiming at k factors existing in the online data-driven simulation model, acquiring a single factor which changes in the value range of any observation point t in the model, and recording the single factor as a change factor ltAnd further constructing a real-time design matrix. Wherein the variation factor lt∈{1,2,…,k}。
In a specific example, a parameter P with the number 1 is observed on 23.01.2020UDIs changed from 0.0285 to 0.03, the current time point is marked as observation point t equal to 1, and the change factor l is marked1 Get factor l 11Random form of N-200 sampling matrices
Figure BDA0003473095250000081
Longitudinal splicing construction real-time design matrix
Figure BDA0003473095250000082
Where i ∈ {1, 2, …, 200 };
the sampling matrix is in random form
Figure BDA0003473095250000083
Is a2 × 29 dimensional matrix in which a behavior factor is combined
Figure BDA00034730952500000811
Another combination of behavior factors
Figure BDA0003473095250000084
Δ is a combination of two factors
Figure BDA0003473095250000085
The difference in the above.
Further, the variation factor l1N sampling matrix random form of 1
Figure BDA0003473095250000086
The acquisition process was as follows and repeated N200 times:
step a1, constructing a (29+1) × 29-dimensional sampling matrix B composed of 0 s and 1 s such that the difference between row 2 and row 1 in column 1 is 1, then using the sampling matrix B as follows:
Figure BDA0003473095250000087
step A2, converting the sampling matrix B into the ith random form required by the observation point t being 1
Figure BDA0003473095250000088
So that 29 factors can only be in the value range of-1, 1]And using Δ 2/3, i.e., { -1, -1/3, 1/3, 1}, using the following conversion equation:
Figure BDA0003473095250000089
wherein D is*Is a diagonal matrix with k-29 dimensions, whose diagonal elements are 1 or-1; x is the number of*Is a combination of factors
Figure BDA00034730952500000812
Making each factor take values only in { -1, -1/3, 1/3 }; j. the design is a square29+1,29A matrix of (29+1) × 29 dimensions with elements all being 1; p*A 29 × 29 random permutation matrix, in which only one element in each row and column is 1, and the rest elements are 0;
step A3, in order to calculate the variation factor l alone11, each selected
Figure BDA00034730952500000810
The initial input line (i.e., line 0) in (c) and the factor l1Forming a new matrix by the row corresponding to 1 to obtain a real-time sampling matrix random form
Figure BDA0003473095250000091
Wherein the matrix is only in the l1Column 1 differs by Δ; the initial input line is
Figure BDA0003473095250000092
With each other row differing by a factor of delta only in one column.
And 3, inputting the real-time design matrix into a simulation model to obtain simulation output, and calculating the base effect of the variation factor.
Defining the expectation that changing each factor from a low level to a high level will not reduce the simulation output, all factors with observation point t of 1 take the current level as a low level and increase (or decrease) the low level by 0.1 as a high level to keep the average flow time monotonically non-decreasing.
Firstly, real-time design matrix
Figure BDA0003473095250000093
The values of the factors are reversely normalized to a real range formed by a low level and a high level, and then the factors are processed line by lineCombining the input simulation models to obtain corresponding simulation output, and calculating each pair of different factor combinations
Figure BDA0003473095250000094
And
Figure BDA0003473095250000095
with respect to the variation factor l1Define the ith radical effect
Figure BDA0003473095250000096
The calculation formula of (2) is as follows:
Figure BDA0003473095250000097
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003473095250000098
and
Figure BDA0003473095250000099
respectively corresponding to random form of sampling matrix
Figure BDA00034730952500000910
The initial input row and the factor row of,
Figure BDA00034730952500000911
are respectively a combination of factors
Figure BDA00034730952500000912
And
Figure BDA00034730952500000913
simulation output; finally obtaining the original base effect sample
Figure BDA00034730952500000914
FIG. 4 shows the variation factor l for 10 observation pointstLow and high level settings of e.g. variation factor l 11 by current level 0.03 (low)Level) falls from 0.1 to 0.027 (high level), the average flow time is monotonously not decreased.
Value x of each factorj(j ∈ {1, …, 29}) the formula for inverse normalization is as follows:
Figure BDA00034730952500000915
step 4, according to the original variation factor base effect sample, converting to obtain a base effect new sample for main effect hypothesis test and a base effect new sample for interactive effect test, and the specific process is as follows:
step 4.1, setting a factor main effect threshold value deltaME0.05 and interaction threshold ΔIESet the principle effect hypothesis test at 0.05
Figure BDA00034730952500000916
Interactive effect hypothesis testing
Figure BDA00034730952500000917
Step 4.2, calculating the sample mean value of all the N-200 original base effect samples
Figure BDA00034730952500000918
And standard deviation of sample
Figure BDA00034730952500000919
The calculation formula is as follows:
Figure BDA00034730952500000920
Figure BDA0003473095250000101
step 4.3, respectively enabling the ith to be less than or equal to 200 original base effect samples
Figure BDA0003473095250000102
Conversion into samples required for the hypothesis test of the main effect
Figure BDA0003473095250000103
And samples required for hypothesis testing of interaction effects
Figure BDA0003473095250000104
The conversion formula is as follows:
Figure BDA0003473095250000105
Figure BDA0003473095250000106
step 5, from the new sample of the primary effect of the hypothesis test
Figure BDA0003473095250000107
In the Bootstrap sample, N is independently sampled 200 times to obtain the main effect hypothesis test
Figure BDA0003473095250000108
And new samples of base effects from interaction effects tests
Figure BDA0003473095250000109
In the Bootstrap sample, N is independently sampled 200 times to obtain the mutual effect hypothesis test
Figure BDA00034730952500001010
The step needs to be repeated 1000 times, and 1000 Bootstrap samples for main effect hypothesis test are obtained respectively
Figure BDA00034730952500001011
Bootstrap sample for testing with interactive effect hypothesis
Figure BDA00034730952500001012
Step 6, Bootstrap samples for each main effect hypothesis test
Figure BDA00034730952500001013
Bootstrap samples tested with each interactive effect hypothesis
Figure BDA00034730952500001014
Respectively carrying out statistics to obtain corresponding test statistics
Figure BDA00034730952500001015
And
Figure BDA00034730952500001016
then B Bootstrap statistics are obtained by two effect hypothesis tests
Figure BDA00034730952500001017
And
Figure BDA00034730952500001018
where B is {1, …, B }.
Further, the test statistic
Figure BDA00034730952500001019
Sum statistics
Figure BDA00034730952500001020
The specific calculation process of (2) is as follows:
step 6.1, calculating Bootstrap samples of each main effect hypothesis test
Figure BDA00034730952500001021
Sample mean of
Figure BDA00034730952500001022
Computing Bootstrap samples for each interaction hypothesis test
Figure BDA00034730952500001023
Sample mean of
Figure BDA00034730952500001024
And standard deviation of sample
Figure BDA00034730952500001025
Figure BDA00034730952500001026
Figure BDA00034730952500001027
Figure BDA00034730952500001028
Wherein B is epsilon {1, …, B }, and is used for distinguishing different Bootstrap samples of the main effect hypothesis test and different Bootstrap samples of the interactive effect hypothesis test;
step 6.2, calculating Bootstrap samples of each main effect hypothesis test
Figure BDA0003473095250000111
Test statistic of
Figure BDA0003473095250000112
And calculating Bootstrap samples for each interaction effect hypothesis test
Figure BDA0003473095250000113
Test statistic of
Figure BDA0003473095250000114
Figure BDA0003473095250000115
Figure BDA0003473095250000116
Step 7, comparing Bootstrap sample statistic of the main effect hypothesis test with the primordial effect sample main effect statistic, and calculating the p value of the main effect test
Figure BDA0003473095250000117
And calculating a p-value for the interaction effect test by comparing the Bootstrap sample statistic of the interaction effect hypothesis test with the interaction effect statistic of the primordial effect sample
Figure BDA0003473095250000118
The calculation process is as follows:
step 7.1: computing dominant effect statistics of primordial effect samples
Figure BDA0003473095250000119
And interaction effect statistics
Figure BDA00034730952500001110
Figure BDA00034730952500001111
Figure BDA00034730952500001112
Step 7.2: calculating the variation factor l1Checking p-value for 1-prime effect hypothesis
Figure BDA00034730952500001113
Checking p-value of sum interaction hypothesis
Figure BDA00034730952500001114
Figure BDA00034730952500001115
Figure BDA00034730952500001116
Step 8, determining the observation point of the most recent important main effect identified by the cut-off observation point t being 1
Figure BDA00034730952500001117
Observation points of important interaction effects
Figure BDA00034730952500001118
The LORD1 rule is used to obtain the test level of the observation point t-1 main effect layer
Figure BDA00034730952500001119
And the level of examination of the interaction effect layer
Figure BDA00034730952500001120
The specific process is as follows:
step 8.1, generating a monotonically non-increasing non-negative sequence
Figure BDA00034730952500001121
So that
Figure BDA00034730952500001122
In this example
Figure BDA00034730952500001123
Step 8.2, since the cut-off observation point t is 1, no important main effect is identified yet, that is, the historical observation point of the important main effect which is identified last time is
Figure BDA00034730952500001124
Calculating the inspection level of the observation point t-1 main effect layer
Figure BDA00034730952500001125
Figure BDA00034730952500001126
Step 8.3, because the cut-off observation point t is 1, no important interaction effect is identified yet, that is, the historical observation point for identifying the important interaction effect last time is
Figure BDA0003473095250000121
Calculating the test level of the observation point t-1 interactive effect hypothesis
Figure BDA0003473095250000122
Is given by the formula
Figure BDA0003473095250000123
Step 9, judging the p value of the main effect layer
Figure BDA0003473095250000124
Greater than the level of examination of the layer
Figure BDA0003473095250000125
If yes, determining the variation factor ltHas no important main effect, otherwise, the variation factor l is determinedtHas important main effects; and judging the p value of the interactive effect layer
Figure BDA0003473095250000126
Greater than the level of examination of the layer
Figure BDA0003473095250000127
If yes, determining the variation factor ltHas no important interaction effect, otherwise, the variation factor l is judgedtHas important interaction effect; completing observation point t to variable factor ltIdentification of (1).
In the present embodiment, specifically, in observation point t ═ 1,
Figure BDA0003473095250000128
determining a shift factor PUDHas no important main effect; while
Figure BDA0003473095250000129
Determining a variation factor PUDHas important interaction effect.
In this example, 10 variable factors were observed in 2020, let ltFor the sequence number of the variable factor, the above steps are repeated to complete the identification of the single variable factor effect of the observation point t.
Notably, when the most recent observation point t is the distance from the historical observation point t, the important main/interactive historical observation point is identified
Figure BDA00034730952500001210
Or
Figure BDA00034730952500001211
The above steps 8.2 and 8.3 need to be calculated based on the historical observation points and the maximum increment of the test level, and if the observation point 2 obtains the test level, the process is as follows:
step 8.2, because the cut-off observation point t is 2, no important main effect is identified yet, that is, the historical observation point which has identified the important main effect last time is
Figure BDA00034730952500001212
Calculating the test level of the observation point t-2 main effect hypothesis
Figure BDA00034730952500001213
The formula is as follows:
Figure BDA00034730952500001214
step 8.3, because the observation point t is 1, the important interaction effect is identified, that is, the observation point of the important interaction effect is identified last time
Figure BDA00034730952500001215
Calculating an observation point t ═2 level of examination of the Interactive Effect hypothesis
Figure BDA00034730952500001216
Is given by the formula
Figure BDA00034730952500001217
Fig. 5 shows the results of the 10-variation factor significant primary/interaction effect identification. As can be seen from FIG. 5, the method of the present invention considers the mean value and standard deviation index of the Morris-based effect method, generates the p value of the hypothesis test of the main effect and the interactive effect under the condition of no parameter, has certain flexibility in the dynamic distribution of the test level, can accurately identify the important effect of the variation factor, and can facilitate the management department to adjust the key link of the corresponding real system in time.
The invention breaks through the traditional offline factor effect identification method, can obtain unbiased results by combining a non-parametric Bootstrap hypothesis testing method aiming at the condition of single factor change in an online data-driven simulation model and considering the unknown characteristic of the distribution of a base effect sample, adopts a layered structure and combines an LORD1 method to control the error level of factor effect identification in the whole real-time process, effectively identifies important influence factors of a simulation system in real time, finds the bottleneck links of a real system, and has important significance for improving systems such as a supply line, a workshop and the like.
It should be noted that, the values of the parameters in the foregoing embodiments do not limit the scope of the present invention, and the values of the parameters may be set and adjusted according to actual needs.
The hierarchical structure adopted by the invention controls the hierarchical Online-FDR to be lower than a set level, so that the overall Online-FDR is always controlled below eta and cannot be influenced by the dependence of a p value between a main effect layer and an interactive effect layer, and the theoretical basis is the following theorem.
The Online-FDRs of the theorem current main effect layer and the interaction effect layer are respectively controlled at
Figure BDA0003473095250000131
And
Figure BDA0003473095250000132
within and satisfy
Figure BDA0003473095250000133
Can make the observation point
Figure BDA0003473095250000134
Hypothesis tested Total Online-FDR satisfaction
Figure BDA0003473095250000135
And (3) proving that: at the observation point t, if the layered Online-FDR is controlled at
Figure BDA00034730952500001314
And
Figure BDA00034730952500001316
within, the following can be obtained:
Figure BDA00034730952500001315
wherein
Figure BDA0003473095250000136
And
Figure BDA0003473095250000137
the number of important main effects and interaction effects are identified for the cut-off observation point t, respectively, and
Figure BDA0003473095250000138
and
Figure BDA0003473095250000139
the number of important main effects and interaction effects are identified for the errors, respectively. Therefore, for the first 2t factor effect test, canTo obtain the total Online-FDR of
Figure BDA00034730952500001310
The first inequality holds depending on
Figure BDA00034730952500001311
And is
Figure BDA00034730952500001312
The second inequality follows the premise assumption. Therefore, when
Figure BDA00034730952500001313
In this case, the FDR (t) < eta.

Claims (10)

1. A factor effect online identification method based on hierarchical error control is characterized by comprising the following steps:
step 1, initializing parameters of main effect recognition and interactive effect recognition according to the FDR level eta of the whole factor effect recognition;
step 2, aiming at k factors existing in the online data-driven simulation model, acquiring a single factor which changes in the value range of any observation point t in the model, and recording the single factor as a change factor ltFurther constructing a real-time design matrix; wherein the variation factor lt∈{1,2,…,k};
Step 3, inputting the real-time design matrix into a simulation model to obtain simulation output, and calculating the base effect of the variation factor;
and 4, converting and obtaining a new base effect sample for main effect hypothesis test and a new base effect sample for interactive effect test according to the original variation factor base effect sample:
step 5, independently sampling the new basic effect samples subjected to the main effect hypothesis test in a returning mode for N times to obtain Bootstrap samples subjected to the main effect hypothesis test, and independently sampling the new basic effect samples subjected to the interactive effect test in a returning mode for N times to obtain Bootstrap samples subjected to the interactive effect hypothesis test;
repeating the step 5 for B times to respectively obtain B Bootstrap samples for main effect hypothesis test and B Bootstrap samples for interactive effect hypothesis test;
step 6, respectively counting Bootstrap samples of each main effect hypothesis test and Bootstrap samples of each interactive effect hypothesis test to obtain corresponding test statistics;
step 7, comparing Bootstrap sample statistic of the main effect hypothesis test with the primordial effect sample main effect statistic, and calculating the p value of the main effect test
Figure FDA0003473095240000011
And calculating a p-value for the interaction effect test by comparing the Bootstrap sample statistic of the interaction effect hypothesis test with the interaction effect statistic of the primordial effect sample
Figure FDA0003473095240000012
Step 8, determining the observation point of the most recent recognition of the important main effect
Figure FDA0003473095240000013
Observation points of important interaction effects
Figure FDA0003473095240000014
Separately obtaining the inspection level of the main effect layer of the observation point t by using the LORD1 rule
Figure FDA0003473095240000015
And the level of examination of the interaction effect layer
Figure FDA0003473095240000016
Step 9, judging the p value of the main effect layer
Figure FDA0003473095240000017
Greater than the level of examination of the layer
Figure FDA0003473095240000018
If yes, determining the variation factor ltHas no important main effect, otherwise, the variation factor l is determinedtHas important main effects; and judging the p value of the interactive effect layer
Figure FDA0003473095240000019
Greater than the level of examination of the layer
Figure FDA00034730952400000110
If yes, determining the variation factor ltHas no important interaction effect, otherwise, the variation factor l is judgedtHas important interaction effect; completing the random observation point t to the variation factor ltIdentification of (1).
2. The method according to claim 1, wherein the initialization of parameters for the recognition of the main effect and the recognition of the interaction effect is specifically:
step 1.1, according to the FDR level eta identified by the overall factor effect, the FDR level identified by the main effect is distributed
Figure FDA0003473095240000021
And interaction recognition FDR levels
Figure FDA0003473095240000022
And satisfy
Figure FDA0003473095240000023
Step 1.2, the maximum test level at which the initial observation point t is 1 is identified for the main effect
Figure FDA0003473095240000024
And defining a maximum increment of the verify level in the recognition of the main effect
Figure FDA0003473095240000025
Identifying a maximum verify level at which an observation point t ═ 1 is initialized for interaction effects
Figure FDA0003473095240000026
And defining a maximum increment of the check level in interactive effect recognition
Figure FDA0003473095240000027
3. The method of claim 1, wherein the method for constructing the real-time design matrix in step 2 comprises: obtaining a variation factor ltN sampling matrix random form
Figure FDA0003473095240000028
Then longitudinally splicing to construct a real-time design matrix
Figure FDA0003473095240000029
The sampling matrix is in random form
Figure FDA00034730952400000210
Is a2 xk dimensional matrix in which a behavior factor is combined
Figure FDA00034730952400000211
Another combination of behavior factors
Figure FDA00034730952400000212
Δ is a combination of two line factors
Figure FDA00034730952400000213
The difference in the above.
4. The method of claim 3, wherein each sampling matrix is in a random form
Figure FDA00034730952400000214
The acquisition process comprises the following steps:
step a1, constructing a (k +1) × k-dimensional sampling matrix B composed of 0 s and 1 s such that the j +1 th row differs from the 1 st row by 1 in the j column, defining the sampling matrix B as follows:
Figure FDA00034730952400000215
step A2, converting the sampling matrix B into the ith random form required by the observation point t
Figure FDA00034730952400000216
So that each factor j epsilon {1, 2, …, k } can only be within the value range of [ -1, 1 [ ]]Is taken over p discrete positions, i.e. xjE { -1, -1+2/(p-1), …, 1}, and defining the conversion formula as follows:
Figure FDA00034730952400000217
wherein D is*Is a diagonal matrix of k dimensions, the diagonal elements of which are 1 or-1; x is the number of*Each factor x injThe value range of (a) is { -1, -1+2/(p-1), …, 1-delta }; j. the design is a squarek+1,kA (k +1) × k-dimensional matrix in which all elements are 1; p is*The random permutation matrix is k multiplied by k, only one element in each row and each column of the random permutation matrix is 1, and the rest elements are 0;
step A3, selecting
Figure FDA0003473095240000031
Line 0 in (1) and the variation factor ltThe corresponding rows form a new matrix to obtain a random form of a real-time sampling matrix
Figure FDA0003473095240000032
Wherein the matrix is only in the ltThe columns differ by a.
5. The method according to claim 1, wherein step 3 is specifically: will design the matrix in real time
Figure FDA0003473095240000033
The values of the factors are reversely normalized to a real range, the factor combinations are input into a simulation model line by line to obtain corresponding simulation output, and finally, each pair of different factor combinations is calculated
Figure FDA0003473095240000034
And
Figure FDA0003473095240000035
with respect to the variation factor ltThe radical effect of (2); ith radical effect
Figure FDA0003473095240000036
The calculation formula of (2):
Figure FDA0003473095240000037
wherein the content of the first and second substances,
Figure FDA0003473095240000038
and
Figure FDA0003473095240000039
respectively in random form of sampling matrix
Figure FDA00034730952400000310
The 0 th line and the variation factor line,
Figure FDA00034730952400000311
Figure FDA00034730952400000312
are respectively a combination of factors
Figure FDA00034730952400000313
And
Figure FDA00034730952400000314
simulation output;
line 0, in
Figure FDA00034730952400000315
With each other row differing by a in only one column.
6. The method according to claim 1, wherein the step 4 comprises the following specific processes:
step 4.1, setting a factor main effect threshold value deltaMEAnd interaction threshold ΔIESetting the dominant effect hypothesis test
Figure FDA00034730952400000316
Interactive effect hypothesis testing
Figure FDA00034730952400000317
Step 4.2, calculating N original base effect samples
Figure FDA00034730952400000318
Sample mean of
Figure FDA00034730952400000319
And standard deviation of sample
Figure FDA00034730952400000320
The calculation formula is as follows:
Figure FDA00034730952400000321
Figure FDA00034730952400000322
step 4.3, respectively enabling the ith to be less than or equal to N original base effect samples
Figure FDA00034730952400000323
Converting the data into samples required by the hypothesis test of the main effect according to the following conversion formula
Figure FDA00034730952400000324
And samples required for hypothesis testing of interaction effects
Figure FDA00034730952400000325
Figure FDA00034730952400000326
Figure FDA00034730952400000327
7. The method of claim 1, wherein the step 6 of performing statistics on each Bootstrap sample to obtain test statistics is calculated by:
step 6.1, calculating Bootstrap samples of each main effect hypothesis test
Figure FDA0003473095240000041
Sample mean of
Figure FDA0003473095240000042
Computing Bootstrap samples for each interaction hypothesis test
Figure FDA0003473095240000043
Sample mean of
Figure FDA0003473095240000044
And standard deviation of sample
Figure FDA0003473095240000045
Figure FDA0003473095240000046
Figure FDA0003473095240000047
Figure FDA0003473095240000048
Wherein B ∈ {1, …, B }, different Bootstrap samples for distinguishing the dominant effect hypothesis test, and different Bootstrap samples for distinguishing the interactive effect hypothesis test;
step 6.2, calculating Bootstrap samples of each main effect hypothesis test
Figure FDA0003473095240000049
Test statistic of
Figure FDA00034730952400000410
And calculating Bootstrap samples for each interaction effect hypothesis test
Figure FDA00034730952400000411
Test statistic of
Figure FDA00034730952400000412
Figure FDA00034730952400000413
Figure FDA00034730952400000414
In the formula,. DELTA.IEIs the threshold value of the interaction effect of the factor.
8. The method of claim 1, wherein the p-value calculation method for the main effect test and the interactive effect test is as follows:
step 7.1: computing dominant effect statistics of primordial effect samples
Figure FDA00034730952400000415
And interaction effect statistics
Figure FDA00034730952400000416
Figure FDA00034730952400000417
Figure FDA00034730952400000418
In the formula (I), the compound is shown in the specification,
Figure FDA00034730952400000419
and
Figure FDA00034730952400000420
the sample mean and the sample standard deviation of the N original base effect samples are respectively.
Step 7.2: calculating the variation factor ltHypothesis testing for p-value
Figure FDA00034730952400000421
Checking p-value of sum interaction hypothesis
Figure FDA00034730952400000422
Figure FDA00034730952400000423
Figure FDA00034730952400000424
Figure FDA00034730952400000425
Bootstrap samples representing the b-th hypothesis test for major effects
Figure FDA00034730952400000426
The test statistic of (a) is,
Figure FDA00034730952400000427
bootstrap samples examined for the b-th cross-effect hypothesis
Figure FDA00034730952400000428
The test statistic of (1).
9. The method according to claim 1, wherein step 8 is specifically:
step 8.1, generating a monotonically non-increasing non-negative sequence
Figure FDA00034730952400000429
So that
Figure FDA00034730952400000430
Step 8.2, determining the observation point which has identified the important main effect for the last time in the first t-1 observation points
Figure FDA0003473095240000051
If no significant primary effect has been identified in the past, then order
Figure FDA0003473095240000052
Calculating the inspection level of the main effect layer of the observation point t
Figure FDA0003473095240000053
Figure FDA0003473095240000054
Step 8.3, determining the observation point which has identified the important interaction effect for the last time in the first t-1 observation points
Figure FDA0003473095240000055
If no significant interaction effect has been identified in the past, then order
Figure FDA0003473095240000056
Then calculating the inspection level of the t interaction effect layer of the observation point
Figure FDA0003473095240000057
Figure FDA0003473095240000058
10. An online identification device for factor effect based on hierarchical error control, comprising a memory and a processor, the memory having stored therein a computer program, characterized in that the computer program, when executed by the processor, causes the processor to carry out the method according to any one of claims 1 to 9.
CN202210047863.9A 2022-01-17 2022-01-17 Factor effect online identification method and device based on hierarchical error control Pending CN114492007A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210047863.9A CN114492007A (en) 2022-01-17 2022-01-17 Factor effect online identification method and device based on hierarchical error control

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210047863.9A CN114492007A (en) 2022-01-17 2022-01-17 Factor effect online identification method and device based on hierarchical error control

Publications (1)

Publication Number Publication Date
CN114492007A true CN114492007A (en) 2022-05-13

Family

ID=81511317

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210047863.9A Pending CN114492007A (en) 2022-01-17 2022-01-17 Factor effect online identification method and device based on hierarchical error control

Country Status (1)

Country Link
CN (1) CN114492007A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115543991A (en) * 2022-12-02 2022-12-30 湖南工商大学 Data restoration method and device based on feature sampling and related equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115543991A (en) * 2022-12-02 2022-12-30 湖南工商大学 Data restoration method and device based on feature sampling and related equipment
CN115543991B (en) * 2022-12-02 2023-03-10 湖南工商大学 Data restoration method and device based on feature sampling and related equipment

Similar Documents

Publication Publication Date Title
CN108197432B (en) Gene regulation and control network reconstruction method based on gene expression data
CN108445752B (en) Random weight neural network integrated modeling method for self-adaptively selecting depth features
CN112990556A (en) User power consumption prediction method based on Prophet-LSTM model
Bučar et al. A neural network approach to describing the scatter of S–N curves
Ayodeji et al. Causal augmented ConvNet: A temporal memory dilated convolution model for long-sequence time series prediction
CN113516248B (en) Quantum gate testing method and device and electronic equipment
CN112765894B (en) K-LSTM-based aluminum electrolysis cell state prediction method
CN112529685A (en) Loan user credit rating method and system based on BAS-FNN
CN114707712A (en) Method for predicting requirement of generator set spare parts
Reddy et al. A concise neural network model for estimating software effort
CN113095484A (en) Stock price prediction method based on LSTM neural network
CN114492007A (en) Factor effect online identification method and device based on hierarchical error control
CN116542701A (en) Carbon price prediction method and system based on CNN-LSTM combination model
Abdulsalam et al. Electrical energy demand forecasting model using artificial neural network: A case study of Lagos State Nigeria
CN109540089B (en) Bridge deck elevation fitting method based on Bayes-Kriging model
CN101206727B (en) Data processing apparatus, data processing method
CN112287605B (en) Power flow checking method based on graph convolution network acceleration
Jat et al. Applications of statistical techniques and artificial neural networks: A review
Dyer et al. Deep signature statistics for likelihood-free time-series models
CN111859799A (en) Method and device for evaluating data accuracy based on complex electromechanical system coupling relation model
CN112232570A (en) Forward active total electric quantity prediction method and device and readable storage medium
CN111863153A (en) Method for predicting total amount of suspended solids in wastewater based on data mining
CN111061708A (en) Electric energy prediction and restoration method based on LSTM neural network
CN115392113A (en) Cross-working condition complex electromechanical system residual life prediction system and method
CN114638421A (en) Method for predicting requirement of generator set spare parts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination