CN117354168A

CN117354168A - Hybrid ensemble method for predictive modeling of Internet of things

Info

Publication number: CN117354168A
Application number: CN202311264520.9A
Authority: CN
Inventors: 梁炯炯; 陈勇; 陈章勇
Original assignee: Yangtze River Delta Research Institute of UESTC Huzhou
Current assignee: Yangtze River Delta Research Institute of UESTC Huzhou
Priority date: 2023-09-27
Filing date: 2023-09-27
Publication date: 2024-01-05

Abstract

The invention discloses a hybrid ensemble method for predictive modeling of the Internet of things, which relates to the technical field of predictive modeling and comprises the steps of S1, physical model construction, S2, statistical model construction, S3, model fusion, S4, first predictive generation, S5, second predictive generation, S6, consistency predictive decision, S7, confidence estimation, S8 and third predictive generation. The invention combines the physical model and the statistical model to fuse the results so as to improve the accuracy of fault detection and prediction, reduces the defects of a single model and improves the overall performance by combining the results of a plurality of models, combines the physical model and the statistical model, fully utilizes the advantages of the physical model and the statistical model, provides visual understanding and prediction of the system behavior, and learns and discovers the mode and rule of the system from the historical data in a data driving mode.

Description

Hybrid ensemble method for predictive modeling of Internet of things

Technical Field

The invention relates to the technical field of predictive modeling, in particular to a hybrid ensemble method for predictive modeling of the Internet of things.

Background

In the Internet of things, a large number of sensors, devices and objects are connected together through the Internet to generate a large amount of data, the data can comprise information such as environmental parameters, device states, user behaviors and the like, rules, trends and modes related to the Internet of things system can be revealed through collection, storage and analysis of the data, so that prediction and modeling are performed, the data of the Internet of things system are effectively analyzed and modeled, and through monitoring of the sensor data of the devices, the states and behaviors of the devices are analyzed, possible faults or abnormal situations of the devices are predicted, and accordingly maintenance and maintenance measures can be timely taken;

the existing modeling of the Internet of things mainly comprises a physical model and a statistical model, wherein the physical model describes the behavior and interaction of the Internet of things system based on a physical principle and a rule, and the statistical model establishes the model by collecting and analyzing data in the Internet of things system based on a statistical principle and a data analysis method;

due to the limitations of a single model, all fault conditions may not be accurately detected and predicted, which may lead to some faults being missed or not found in time, increasing the time and cost of fault repair, and a single model may produce false fault notifications, i.e. false fault alarms, which, although in fact no real fault exists, bring about false alarm trouble and costs, as maintenance personnel may perform unnecessary interventions and maintenance operations, wasting resources and time.

Disclosure of Invention

The invention aims to provide a hybrid ensemble method for predictive modeling of the internet of things, so as to solve the problems in the background art.

In order to achieve the above purpose, the present invention provides the following technical solutions: the hybrid ensemble method for predictive modeling of the Internet of things comprises the following steps:

s1, building a physical model: constructing a mathematical expression according to the physical principle of the physical equipment, so as to construct a physical model, determining a parameter value in the model by a parameter estimation method according to the collected data, and applying the model to an independent test data set, so as to compare a model prediction result with actual observation data, and evaluate the fitting degree and prediction capability of the model;

s2, constructing a statistical model: collecting historical data related to the physical equipment, including input and output variables of the physical equipment, selecting and extracting features from the collected historical data, constructing a prediction model through a machine learning algorithm according to a specific prediction target, and constructing an optimization model according to the prediction model;

s3, model fusion: if the confidence level of the first predicted result and the second predicted result is higher, directly selecting one of the first predicted result and the second predicted result as output, if the confidence level of the two predicted results is lower, combining the predicted results of the two models by using a voting method, taking the combined predicted results as output, and inputting the output to the physical model and the statistical model again;

s4, first prediction generation: inputting operation parameters to a physical model according to a prediction target, and generating a first prediction result through the physical model;

s5, second prediction generation: inputting operation parameters to a prediction model according to a prediction target, and generating a second prediction result through the prediction model;

s6, consistency prediction decision: f1 fraction is obtained through cross verification and is used as a consistency judgment index, and when the sample capacity is large, a confidence interval is calculated based on the property of normal distribution according to the sample mean value, standard deviation and sample capacity;

s7, confidence estimation: evaluating the confidence level of the consistency decision according to the calculated confidence interval, and judging whether the two groups of prediction results have consistency according to the confidence level;

s8, third prediction generation: in step S3, if the confidence level between the prediction received for the second time and the prediction received for the previous time is high, one of the predictions received for the second time is directly selected as the third prediction result.

Preferably, the step S1 specifically includes the following steps:

s101, first data collection: collecting physical information characteristics of physical equipment, and dividing the physical information characteristics into external physical information characteristics and internal physical information characteristics;

s102, establishing a mathematical expression: based on the physical principle and the internal physical information characteristics, establishing a mathematical equation to describe the behavior of the equipment;

s103, parameter estimation: determining parameter values in the model by a parameter estimation method according to the collected physical information characteristics;

s104, model verification: and applying the model to an independent test data set, comparing the model prediction result with actual observation data, and evaluating the fitting degree and the prediction capability of the model.

Preferably, in step S101, external physical information features are used to construct the external shape of the physical device, and internal physical information features are used to construct the physical principles of the physical device.

Preferably, the step S2 specifically includes the following steps:

s201, second data collection: collecting historical data related to the physical device, including input and output variables;

s202, feature selection and extraction: evaluating the degree of association between the features and the predicted targets by calculating the correlation between the features and the predicted targets, evaluating the importance of the features by the feature importance index of the decision tree model, punishing unimportant features by applying regularization technology, and automatically selecting important features in the model training process;

s203, data preprocessing: performing data preprocessing on the selected features, including missing value processing, outlier processing and standardization;

s204, model training: model training is carried out through a decision tree learning algorithm;

s205, model evaluation: evaluating the model obtained by training based on a cross-validation method;

s206, constructing an optimization model: constructing an optimization model according to the output result of the prediction model and a specific prediction target, and determining the optimal input variable combination or setting through the optimization model;

s207, model optimization feedback: and further optimizing the optimizing model by a model optimizing feedback method.

Preferably, in step S201, the input variables of the physical device include sensor data, operation parameters, and external environmental factors, and the output variables of the physical device include device status data, device performance indicators, and fault records.

Preferably, in step S207, the model optimization feedback method specifically includes the following steps:

a1, transmitting a second prediction result: transmitting a second prediction result generated by the prediction model to an operator;

a2, changing an optimization model: constructing constraint conditions according to the equipment state data fed back by the logic controller, and optimizing a prediction model by adopting a constraint algorithm so as to change the optimization model;

a3, the operator issues control: the operator issues a control command to the logic controller according to the second prediction result;

a4, the logic controller sends out a control signal: the logic controller sends out corresponding control signals according to control commands issued by operators, transmits the control signals to the physical equipment, receives the equipment state data fed back by the physical equipment in the running process, and transmits the received equipment state data to the optimization model;

a5, operating the physical equipment: the physical equipment starts the corresponding parts of the physical equipment to run according to the control signals sent by the logic controller;

a6, the physical equipment generates state data: the physical device generates an output variable in operation, the output variable including state data of the device, and feeds the generated state data back to the logic controller.

Preferably, the step S1 specifically includes the following steps:

s601, calculating F1 score: obtaining F1 score through cross validation;

s602, calculating F1 fraction mean and standard deviation: calculating the mean value and standard deviation of the F1 fraction;

s603, determining a confidence level: confidence level was set to 95% and significance level was 0.05;

s604, calculating upper and lower limits of a confidence interval: calculating the upper limit and the lower limit of the confidence interval through an upper limit calculation formula and a lower limit calculation formula, wherein the upper limit calculation formula specifically comprises the following steps:

wherein,mean value of F1 score,/->Represents a critical value determined according to the required confidence level, s represents the standard deviation of the F1 score, n represents the number of F1 scores, CI _upper Representing an upper confidence interval limit;

the lower limit calculation formula is specifically as follows:

wherein,mean value of F1 score,/->Represents a critical value determined according to the required confidence level, s represents the standard deviation of the F1 score, n represents the number of F1 scores, CI _lower Indicating the upper confidence interval limit.

Preferably, in step S3, the voting method specifically includes the steps of:

firstly, locally processing data of voters through a random response disturbance technology, adding noise so as to form disturbance data, and then submitting the disturbance data to a blockchain network;

secondly, classifying the ballot weight into two levels, estimating the statistical value of ballot data by using a maximum likelihood function, and estimating a quota value according to the estimated statistical value, wherein the statistical value calculation formula of the ballot data is specifically as follows:

wherein C is _wi Representing the statistics of the ballot data,an estimated value representing the statistical value of the ballot data of each class weight group, R _w A transition probability matrix representing the weight data;

the quota value calculation formula is specifically:

wherein,represents a quota value->An estimated value representing a statistical value of the ballot data of each class weight group;

step three, finally, the voting manager downloads submitted disturbance data from the blockchain network, calculates the result and weights the votes w _i With voter intent data p _i Multiplying and summing to obtain the final resultThe result is then compared with the estimated quota value if +.>Representing more than half of the citizens endorsementsAnd if the scheme is out, the scheme is successfully passed.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the invention, the physical model and the statistical model are combined, the results of the physical model and the statistical model are fused to improve the accuracy of fault detection and prediction, the defects of a single model can be reduced, the overall performance is improved by integrating the results of a plurality of models, the physical model and the statistical model are combined, the advantages of the physical model and the statistical model can be fully utilized, the physical model provides visual understanding and prediction of the system behavior, and the statistical model learns and discovers the mode and rule of the system from historical data in a data driving mode;

2. according to the invention, the prediction results of the two models are combined through the voting method, the respective advantages of the prediction results of the models can be fully utilized by voting, so that the overall accuracy is improved, if one model performs well under certain conditions, and the other model performs well under other conditions, the results of the models can be comprehensively considered through the voting method to obtain more accurate prediction, and when a complex model is used, the risk of over fitting possibly exists, namely the model performs well on training data, but the generalization capability on new samples is poor, and the multiple different models can be combined through the voting method, so that the risk of over fitting can be reduced, and the generalization capability of the model is improved.

Drawings

FIG. 1 is a flowchart of an overall method provided by an embodiment of the present invention;

FIG. 2 is a flowchart of a method for constructing a physical model according to an embodiment of the present invention;

FIG. 3 is a flowchart of a method for statistical model construction provided by an embodiment of the present invention;

FIG. 4 is a flowchart of a method for consistent prediction decision provided by an embodiment of the present invention;

FIG. 5 is a flowchart of a method for model optimization feedback according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1-5, the present invention provides a technical solution: the hybrid ensemble method for predictive modeling of the Internet of things comprises the following steps:

The step S1 specifically comprises the following steps:

s104, model verification: applying the model to an independent test data set, comparing a model prediction result with actual observation data, and evaluating the fitting degree and the prediction capability of the model;

in step S101, external physical information features are used to construct the shape of the physical device, and internal physical information features are used to construct the physical principles of the physical device;

the step S2 specifically includes the following steps:

s207, model optimization feedback: further optimizing the optimizing model by a model optimizing feedback method;

in step S201, input variables of the physical device include sensor data, operation parameters, and external environmental factors, and output variables of the physical device include device state data, device performance index, and fault record;

in step S207, the model optimization feedback method specifically includes the steps of:

a6, the physical equipment generates state data: the physical equipment generates an output variable in operation, wherein the output variable comprises state data of the equipment, and the generated state data is fed back to the logic controller;

the step S1 specifically comprises the following steps:

s601, calculating F1 score: obtaining F1 score through cross validation;

the lower limit calculation formula is specifically as follows:

wherein,mean value of F1 score,/->Represents a critical value determined according to the required confidence level, s represents the standard deviation of the F1 score, n represents the number of F1 scores, CI _lower Representing an upper confidence interval limit;

in step S3, the voting method specifically includes the steps of:

the quota value calculation formula is specifically:

step three, finally, the voting manager downloads submitted disturbance data from the blockchain network, calculates the result and weights the votes w _i With voter intent data p _i Multiplying and summing to obtain the final resultThe result is then compared with the estimated quota value if +.>Meaning that more than half of the selection approves the proposed solution, the solution passes successfully.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The hybrid ensemble method for predictive modeling of the Internet of things is characterized by comprising the following steps of:

2. The hybrid ensemble method for predictive modeling of the internet of things of claim 1, wherein: the step S1 specifically comprises the following steps:

3. The hybrid ensemble method for predictive modeling of the internet of things of claim 2, wherein: in step S101, external physical information features are used to construct the appearance of the physical device, and internal physical information features are used to construct the physical principles of the physical device.

4. The hybrid ensemble method for predictive modeling of the internet of things of claim 1, wherein: the step S2 specifically includes the following steps:

5. The hybrid ensemble method for predictive modeling of the internet of things of claim 4, wherein: in step S201, input variables of the physical device include sensor data, operation parameters, and external environmental factors, and output variables of the physical device include device status data, device performance indicators, and fault records.

6. The hybrid ensemble method for predictive modeling of the internet of things of claim 4, wherein: in step S207, the model optimization feedback method specifically includes the steps of:

7. The hybrid ensemble method for predictive modeling of the internet of things of claim 1, wherein: the step S1 specifically comprises the following steps:

s601, calculating F1 score: obtaining F1 score through cross validation;

s604, calculating upper and lower limits of a confidence interval: and calculating the upper limit and the lower limit of the confidence interval through an upper limit calculation formula and a lower limit calculation formula.

8. The hybrid ensemble method for predictive modeling of the internet of things of claim 1, wherein: in step S3, the voting method specifically includes the steps of:

secondly, classifying the ballot weight into two levels, estimating the statistical value of ballot data by using a maximum likelihood function, and estimating a quota value according to the estimated statistical value, wherein the quota value is used for judging whether the scheme is supported by a majority voter or not;

and thirdly, finally, downloading submitted disturbance data from the blockchain network by the voting manager, calculating a result, multiplying the voting weight with the voter intention data, summing to obtain a final result, comparing the result with an estimated quota value, and if the result is greater than or equal to the quota value, indicating that more than half of the voters approve the proposed scheme, successfully passing the scheme.