CN110472197A - Method based on the prediction windowing behavior of Binary Logistic model use dummy variable - Google Patents

Method based on the prediction windowing behavior of Binary Logistic model use dummy variable Download PDF

Info

Publication number
CN110472197A
CN110472197A CN201910776641.9A CN201910776641A CN110472197A CN 110472197 A CN110472197 A CN 110472197A CN 201910776641 A CN201910776641 A CN 201910776641A CN 110472197 A CN110472197 A CN 110472197A
Authority
CN
China
Prior art keywords
variable
outdoor
indoor
prediction
windowing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910776641.9A
Other languages
Chinese (zh)
Inventor
潘嵩
崔桐
谷雅秀
吴金顺
张行星
朴咏玫
王春青
林明彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Caterie Environmental Energy Co Ltd
Original Assignee
Suzhou Caterie Environmental Energy Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Caterie Environmental Energy Co Ltd filed Critical Suzhou Caterie Environmental Energy Co Ltd
Priority to CN201910776641.9A priority Critical patent/CN110472197A/en
Publication of CN110472197A publication Critical patent/CN110472197A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Operations Research (AREA)
  • Human Resources & Organizations (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Economics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Biology (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Algebra (AREA)
  • Development Economics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a kind of methods based on the prediction windowing behavior of Binary Logistic model use dummy variable, this method comprises: multivariate analysis of variance, filters out independent variable relevant to windowing behavior;Collinearity diagnostics filter out the low independent variable of synteny;Dualistic logistic regression model prediction windowing behavior is established by the way that dummy variable is arranged.Method provided by the invention based on the prediction windowing behavior of Binary Logistic model use dummy variable realizes application of the dummy variable in prediction windowing behavior dualistic logistic regression model, improves the precision of prediction of model.

Description

Method based on the prediction windowing behavior of Binary Logistic model use dummy variable
Technical field
The present invention relates to windowing behavior prediction technical fields, are based on Binary Logistic model more particularly to one kind With the method for dummy variable prediction windowing behavior.
Background technique
Windowing behavior is one two classification (0 represents pass window, and 1 represents windowing) problem, therefore at present both at home and abroad to windowing row To use more algorithm for Binary Logistic (dualistic logistic regression) in research.Wherein dependent variable is the state of window To open a window or closing window, independent variable includes environmental factor (indoor and outdoor temperature and humidity, PM2.5 concentration, interior CO2 concentration, wind speed, wind direction And solar radiation etc.).The form for generalling use continuous variable to independent variable in existing research achievement carries out modeling and forecasting and opens Window behavior, the corresponding regression coefficient of each independent variable explain the switch state of window.Current research is all directly will be continuous Property variable is brought into regression model, such as when outside relative humidity brings model into continuous variable, is construed to relatively wet Every increase by 1% of influence when spending to(for) dependent variable (windowing behavior), but it is very micro- that often humidity, which increases by 1% its influence to windowing, It is weak that there is no too big practical significances.
Dummy variable, also known as dummy variable or nominal variable are artificial illusory variables, and usual value is 0 or 1 to reflect The different attribute of some variable, wherein " 1 " represents "Yes", " 0 " represents "No".For there is the independent variable of n categorical attribute, usually It needs to choose 1 classification as reference classification, therefore can produce n-1 dummy variable.Minimum or highest can be chosen referring to variable The dummy variable of horizontal classification and other different levels, which compares, observes it to windowing behavioral implications situation.At this point, we can This continuous variable of outside relative humidity is carried out discretization, division is carried out for an interval according to 15% and is changed into mute change Amount, such as less than 20%, 20%-35%, 35%-50%, 50%-65%, 65%-80%, be greater than 80%, model is constructed at this time Influence when regression coefficient can be construed to 15% multiple of outside relative humidity every change to dependent variable window state.Its In each dummy variable represent to the difference of windowing influence degree between several ranks of relative humidity, by constructing regression model, often One dummy variable can obtain the regression coefficient of an estimation, so that the result returned is easier to explain, with more practical Meaning.Similarly, the parameters such as indoor carbon dioxide concentration, indoor and outdoor temperature, PM2.5 concentration are built after can also classifying by different level Mould, can more detailed explanation difference independent variable under different level to windowing behavior influence degree, relative to forefathers study In the windowing behavior that describes of explaining that this way is more careful for windowing behavior of the corresponding regression coefficient of independent variable mention The high precision of prediction of model.
Summary of the invention
The object of the present invention is to provide a kind of based on the prediction windowing behavior of Binary Logistic model use dummy variable Method realizes application of the dummy variable in prediction windowing behavior dualistic logistic regression model, improves the precision of prediction of model.
To achieve the above object, the present invention provides following schemes:
A method of windowing behavior is predicted based on Binary Logistic model use dummy variable, the method includes Following steps:
Each variable is distributed model according to specific measured data by S1, multivariate analysis of variance, multiple variables inside and outside collection room It encloses and carries out different level division, and each horizontal division case balanced number, meanwhile, judge that each variable carries out different level division Classified variable afterwards whether Normal Distribution, variance analysis is carried out to the classified variable of Normal Distribution, is examined according to Wald Probability Sig value corresponding to statistical variable is tested, if Sig value less than 0.05, shows that the variable has significant shadow to windowing behavior It rings, to filter out independent variable relevant to windowing behavior;
S2, collinearity diagnostics carry out collinearity diagnostics to the multiple independents variable relevant to windowing behavior filtered out, according to The synteny decision condition of setting filters out the low independent variable of synteny;
S3, dualistic logistic regression model foundation and windowing behavior prediction, are set based on the independent variable that step S2 is filtered out Dummy variable is set, dualistic logistic regression model is established, the logistic regression coefficient of each dummy variable is obtained according to the independent variable, it will be described Logistic regression coefficient substitutes into dualistic logistic regression equation, obtains windowing behavior prediction value.
Optionally, the synteny decision condition according to setting in the step S2, filters out the low independent variable of synteny, Specifically include: the independent variable for meeting the synteny decision condition of setting is screened out, as the low independent variable of synteny, In, the synteny decision condition is the variance inflation factor that 0<VIF<5 and TOL>0.2, VIF indicates multiple independents variable, TOL table Show the feasible value of multiple independents variable.
Optionally, in the step S1, multiple variables inside and outside collection room are specifically included: acquisition room temperature, interior are opposite Humidity, interior CO2 concentration, interior PM2.5 concentration, outdoor temperature, outside relative humidity, outdoor PM2.5 concentration, outdoor wind speed, Outdoor wind direction, indoor/outdoor temperature-difference, indoor and outdoor relative humidity differential, indoor and outdoor PM2.5 concentration difference, solar radiation and time.
Optionally, filter out the low independent variable of synteny in the step S2, specifically include: the synteny is low to be become certainly Amount includes room temperature, indoor relative humidity, interior CO2Concentration, interior PM2.5 concentration, outdoor wind speed, outdoor wind direction, interior The outer temperature difference, indoor and outdoor relative humidity differential, indoor and outdoor PM2.5 concentration difference, solar radiation and time.
Optionally, in the step S3, dualistic logistic regression model is established, based on the independent variable that step S2 is filtered out Dummy variable is set, the logistic regression coefficient of each dummy variable is obtained according to the independent variable, the logistic regression coefficient is substituted into two In metalogic regression equation, obtains windowing behavior prediction value, specifically includes:
Dualistic logistic regression equation indicates are as follows:
In formula, P value is cut off value with 0.5, when P≤0.5 represents window as open state, when P < 0.5 represents window as pass Closed state;
Z=B(i)Time(i)+B(k)RH(k)+B(k)CO2(k)+B(k)Φ(k)+B(k)ΔT(k)+B(k)ΔRH(k)
+B(k)ΔPM2.5(k)+B(m)PM2.5(m)+B(m)V(m)+B(n)T(n)+B(n)R(n)-3.765;
Wherein, B indicates the regression coefficient of each independent variable, and subscript represents each independent variable different level dummy variable, Time table Show the time, RH indicates indoor relative humidity, CO2Indicate interior CO2 concentration, Φ indicates that outdoor wind direction, Δ Τ indicate indoor and outdoor temperature Difference, Δ RH indicate that indoor and outdoor relative humidity differential, Δ PM2.5 indicate indoor and outdoor PM2.5 concentration difference, and PM2.5 indicates that interior PM2.5 is dense Degree, V indicate that outdoor wind speed, T indicate room temperature, and R indicates solar radiation.
The specific embodiment provided according to the present invention, the invention discloses following technical effects: provided by the invention to be based on The method that Binary Logistic model use dummy variable predicts windowing behavior, if representing independent variable by the way that dummy variable is arranged To the difference of windowing influence degree between dry rank, make each dummy variable that can show that one is returned by constructing regression model Return coefficient, can more detailed explanation difference independent variable under different level to windowing behavior influence degree, relative to forefathers The corresponding regression coefficient of an independent variable explains that windowing behavior is more of practical significance in research.
Detailed description of the invention
It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings Obtain other attached drawings.
Fig. 1 is the side for predicting windowing behavior in the embodiment of the present invention based on Binary Logistic model use dummy variable Method flow chart;
Fig. 2 is room temperature different level distribution situation and normal curve in the embodiment of the present invention;
Fig. 3 is indoor relative humidity different level distribution situation and normal curve in the embodiment of the present invention;
Fig. 4 is interior CO2 concentration different level distribution situation and normal curve in the embodiment of the present invention;
Fig. 5 is interior PM2.5 concentration different level distribution situation and normal curve in the embodiment of the present invention;
Fig. 6 is outdoor temperature different level distribution situation and normal curve in the embodiment of the present invention;
Fig. 7 is outside relative humidity different level distribution situation and normal curve in the embodiment of the present invention;
Fig. 8 is outdoor PM2.5 concentration room temperature different level distribution situation and normal curve in the embodiment of the present invention;
Fig. 9 is solar radiation different level distribution situation and normal curve in the embodiment of the present invention;
Figure 10 is outdoor wind speed different level distribution situation and normal curve in the embodiment of the present invention;
Figure 11 is outdoor wind direction different level distribution situation and normal curve in the embodiment of the present invention;
Figure 12 is indoor/outdoor temperature-difference different level distribution situation and normal curve in the embodiment of the present invention;
Figure 13 is indoor and outdoor relative humidity differential different level distribution situation and normal curve in the embodiment of the present invention;
Figure 14 is indoor and outdoor PM2.5 concentration difference different level distribution situation and normal curve in the embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
The object of the present invention is to provide a kind of based on the prediction windowing behavior of Binary Logistic model use dummy variable Method realizes application of the dummy variable in prediction windowing behavior dualistic logistic regression model, improves the precision of prediction of model.
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Applying mode, the present invention is described in further detail.
Fig. 1 is the side for predicting windowing behavior in the embodiment of the present invention based on Binary Logistic model use dummy variable Method flow chart, as shown in Figure 1, provided by the invention predict windowing behavior based on Binary Logistic model use dummy variable Method the following steps are included:
Each variable is distributed model according to specific measured data by S1, multivariate analysis of variance, multiple variables inside and outside collection room It encloses and carries out different level division, and each horizontal division case balanced number, meanwhile, judge that each variable carries out different level division Classified variable afterwards whether Normal Distribution, variance analysis is carried out to the classified variable of Normal Distribution, is examined according to Wald Probability Sig value corresponding to statistical variable is tested, if Sig value less than 0.05, shows that the variable has significant shadow to windowing behavior It rings, to filter out independent variable relevant to windowing behavior;
S2, collinearity diagnostics carry out collinearity diagnostics to the multiple independents variable relevant to windowing behavior filtered out, according to The synteny decision condition of setting filters out the low independent variable of synteny;
S3, dualistic logistic regression model foundation and windowing behavior prediction, are set based on the independent variable that step S2 is filtered out Dummy variable is set, dualistic logistic regression model is established, the logistic regression coefficient of each dummy variable is obtained according to the independent variable, it will be described Logistic regression coefficient substitutes into dualistic logistic regression equation, obtains windowing behavior prediction value.
Wherein, in the step S1, multiple variables inside and outside collection room are specifically included: acquisition room temperature, interior are relatively wet Degree, interior CO2 concentration, interior PM2.5 concentration, outdoor temperature, outside relative humidity, outdoor PM2.5 concentration, outdoor wind speed, room Outer wind direction, indoor/outdoor temperature-difference, indoor and outdoor relative humidity differential, indoor and outdoor PM2.5 concentration difference, solar radiation and time;
Then each variable is subjected to different level division according to specific measured data distribution, in order to guarantee that the later period builds The situation that the stability of mould should avoid certain level case number excessive or very few when dividing, it is ensured that each horizontal division case In addition to this balanced number needs to guarantee that the classified variable Normal Distribution after independent variable progress different level division can just carry out Variance analysis filters out variable relevant to dependent variable (window state), divides details and is shown in Table 1.
The windowing behavioral implications factor level distribution of table 1
Whether meet normal state point by the different each horizontal distribution situations of variable known to each variable histogram and the variable Cloth, each variable, which is all satisfied normal distribution after division is horizontal, known to Fig. 2-Figure 14 can directly carry out multifactor analysis of variance screening Out to windowing the significant relevant independent variable of behavior.
Multivariate analysis of variance is carried out to the variation of Normal Distribution, open a window behavioral implications variable as shown in Table 2 Effect verification result is that all variable conspicuousnesses are respectively less than 0.05 between main body, according to probability corresponding to Wald inspection statistics variable Sig value, if Sig value less than 0.05, shows that the variable has a significant impact windowing behavior, each variable is to windowing in table 2 Behavior has a significant impact, and can be selected into model there are correlation with windowing behavior and make independent variable.
Effect is examined and determine between the windowing behavioral implications variable main body of table 2
Dependent variable: window state
Wherein, the synteny decision condition according to setting in the step S2 filters out the low independent variable of synteny, tool Body includes: that the independent variable of the synteny decision condition of satisfaction setting is screened out, as the low independent variable of synteny, wherein The synteny decision condition is the variance inflation factor that 0<VIF<5 and TOL>0.2, VIF indicates multiple independents variable, and TOL is indicated The feasible value of multiple independents variable.Filter out the low independent variable of synteny in the step S2, specifically include: the synteny is low Independent variable includes room temperature, indoor relative humidity, interior CO2Concentration, interior PM2.5 concentration, outdoor wind speed, outdoor wind direction, Indoor/outdoor temperature-difference, indoor and outdoor relative humidity differential, indoor and outdoor PM2.5 concentration difference, solar radiation and time.
Multiple variables are highly relevant in modeling process will cause the unstable of model, it is therefore necessary to carry out between variable Collinearity diagnostics.Variance inflation factor (VIF): refer between explanatory variable there are when multicollinearity variance with there is no more The ratio between variance when weight synteny.Feasible value (TOL) is the inverse of VIF, and VIF is bigger, and display synteny is more serious.Micro-judgment Method shows when there is no multicollinearities by 0<VIF<5 and TOL>0.2.Through inspection discovery outdoor temperature, outside relative humidity, Outdoor PM2.5 concentration and the higher appearance of indoor/outdoor temperature-difference, indoor and outdoor relative humidity, indoor and outdoor PM2.5 concentration difference VIF value are conllinear Property, it is contemplated that indoor and outdoor difference can reflect therefore the indoor comprehensive condition with outdoor rejects outdoor temperature, relative humidity simultaneously And three variables of PM2.5 concentration largely reduce multicollinearity phenomenon.Synteny is examined after screening as shown in Table 3 As a result 5 and TOL value is respectively less than for each variable V IF value and is all larger than 0.2, between the independent variable for influencing cubicle building windowing behavior Synteny is lower, and model is relatively stable.
3 test for multi-collinearity of table
Dependent variable: window state
Wherein, dummy variable is set by the step S2 independent variable filtered out, illustrates herein and sets outside relative humidity The method for being set to dummy variable, firstly, outside relative humidity interval 15% is divided into 6 levels, respectively less than 20%, 20%-35%, 35%-50%, 50%-65%, 65%-80%, it is greater than 80%.Choose outside relative humidity " less than 20% " this Remaining 5 different level classification settings are dummy variable as reference classification by one kind." 0 ", " 1 " coding is utilized to redefine 5 A dummy variable is level 1, level 2, level 3, level 4, level 5, wherein " 1 " represents "Yes", " 0 " represents "No", sorting code number List is as shown in table 4.When all horizontal classifications coding be 0 represent this RH range for less than 20%, belong to referring to change Amount;When " level 1 " is encoded to 1, other level codes are 0 to represent RH range as 20%-35%;When " level 2 " is compiled Code is 1, other level codes are 0 to represent RH range as 35%-50%;When " level 3 " is encoded to 1, other are horizontal Coding is 0 to represent RH range as 50%-65%;When " level 4 " is encoded to 1, other level codes are 0 to represent phase It is 65%-80% to humidity range;When " level 5 " is encoded to 1, other level codes are 0 to represent relative humidity distribution Greater than 80%.
4 outside relative humidity sorting code number list of table
The each dummy variable of outside relative humidity for constructing model at this time can be construed to the every change of outside relative humidity " 15% multiple " and relative humidity " less than 20% " compares its influence degree to dependent variable (window state).
Meanwhile the different level dummy variable list after all independents variable recodifications is as shown in table 5.5 classified variable of table is again Dummy variable coding after definition
Dualistic logistic regression model can be established after dummy variable by, which setting up, obtains each mute change of independent variable different level in equation Corresponding logistic regression coefficient is measured, and then obtains the logistic regression equation of prediction windowing behavior, eventually by " model-fitting degree is examined Test " and the model established of the model established with dummy variable of " windowing behavior prediction accuracy rate " comparison and continuous variable which It is more accurate.
Windowing behavior is one two classification (0 represents pass window, and 1 represents windowing) problem, is ground both at home and abroad to windowing behavior at present Study carefully and middle uses more algorithm for Binary Logistic (dualistic logistic regression).Wherein window state is dependent variable, is influenced out The factor of window behavior is independent variable.
Assume that dependent variable is y in Binary Logistic model, value indicates that event, value are when being " 1 " Indicate that event does not occur when " 0 ", n independent variable for influencing y is x1, x2, x3..., xn, the probability that P expression event occurs, 1-P table Show that the probability that the time do not occur, Binary Logistic model are as follows:
For the linear model for seeking Binary Logistic regression model, the generation ratio of event is acquired first, i.e. event occurs With the ratio between the probability that does not occur P/ (1-P), Logarithm conversion is then carried out to it can obtain Binary Logistic regression model Linear model it is as follows:
It can be seen that the independent variable that the regression coefficient of Binary Logistic equation can be understood as a unit becomes The change value of the logarithm of ratio occurs caused by change.Due to understanding that " generation " is easy than the logarithm for understanding " occur than ", so general Equation can be written as follow form:
β0Constant is represented, when other explanatory variables remain unchanged, and i-th of independent variable changes a unit, ratio occurs Changing value is exp (βi).The regression coefficient β of independent variableiFor positive value, indicate that the probability that event occurs increases, exp (βi)>1;From change The regression coefficient β of amountiFor negative value, indicate that the probability that event occurs reduces, exp (βi)<1;The regression coefficient β of independent variableiIt is 0, exp(βi)=1 indicates to be influenced with the presence or absence of generation than all will not change i.e. probability regardless of the independent variable;
It enables in order to facilitate understandingSo logistic regression equation (1) formula is available Following form expression:
P value is cut off value with 0.5 in above formula, when P≤0.5 represents window as open state, when P < 0.5 represents window as pass Closed state.Establish that dualistic logistic regression modal analysis results are as shown in table 6, wherein " B " value indicates the regression coefficient of each variable, under Footmark represents each variable different level dummy variable, it can thus be concluded that equation out:
Z=B(i)Time(i)+B(k)RH(k)+B(k)CO2(k)+B(k)Φ(k)+B(k)ΔT(k)+B(k)ΔRH(k)
+B(k)ΔPM2.5(k)+B(m)PM2.5(m)+B(m)V(m)+B(n)T(n)+B(n)R(n)-3.765
(6)
In formula: Time --- time, i ∈ [1,11];
RH --- indoor relative humidity, k ∈ [1,5];
CO2--- indoor carbon dioxide concentration, k ∈ [1,5];
Φ --- wind direction, k ∈ [1,5];
PM2.5 --- indoor PM2.5, k ∈ [1,5];
Δ Τ --- indoor/outdoor temperature-difference, k ∈ [1,5];
Δ RH --- indoor and outdoor relative humidity differential, k ∈ [1,5];
Δ PM2.5 --- indoor and outdoor PM2.5 concentration difference, m ∈ [Isosorbide-5-Nitrae];
V --- wind speed, m ∈ [Isosorbide-5-Nitrae];
T --- room temperature, n ∈ [1,3];
R --- solar radiation, n ∈ [1,3].
Finally show that the dummy variable parameter under each independent variable different level is as shown in table 6 by logistic regression analysis.In table Logistic regression coefficient " B " and " EXP (B) " be logarithmic relationship, will " B " progress Logarithm conversion after obtain " EXP (B) "."EXP (B) " value be used to explain different level dummy variable with referring to classification compare its to windowing influence degree how.Such as interior PM2.5 Concentration is divided into four levels, and it is 1-35ug/m3 that " level 1 ", which is interior PM2.5 concentration distribution range, as shown in Table 1, by " water For " indoor pm2.5 concentration " in flat 1 " i.e. table 6 as referring to classification, be left three levels is " indoor pm2.5 in table 6 respectively Concentration (1) ", " indoor pm2.5 concentration (2) ", " indoor pm2.5 concentration (3) " be three horizontal dummy variables respectively with reference Classification compares.Corresponding EXP (B) value of as shown in table 6 interior three levels of PM2.5 concentration is respectively 1.676,2.289, 7.461, cause the probability to open a window to be referring to classification with referring to classification " indoor PM2.5 concentration " comparison " indoor pm2.5 concentration (1) " 1.676 times, " indoor pm2.5 concentration (2) " causes the probability of windowing to be " the indoor pm2.5 concentration referring to 2.289 times of classification (3) " causing the probability to open a window is 7.461 times of reference classification, and three numerical value illustrate that indoor PM2.5 concentration is bigger, at window It is bigger in the open state a possibility that, it may be interpreted as the windowing when outdoor pollution is more serious and introduced outdoor PM2.5 particulate matter Cause indoor PM2.5 concentration higher.
The windowing behavior Multi-parameter binary logistic regression result of table 6
Usually judge whether an independent variable is significant in the model established, it can be according to Wald test statistics institute Corresponding probability Sig value is examined.If in final result, the Sig value of each independent variable less than 0.05, then show independent variable to because Variable has a significant impact and there are correlations.But it can be seen that the dummy variable Sig of individual independents variable in Regression Analysis Result table 6 Value is greater than 0.05, represents the horizontal dummy variable with comparing not statistically significant referring to classification and is directed to windowing behavior.It is building Variance analysis is carried out to all independents variable in model before formwork erection type and finds that each independent variable has statistics with windowing behavior It learns meaning (being shown in Table 2), so Sig value herein does not hinder being not statistically significant yet.It can thus be appreciated that establishing binary logic When regression model, if the different dummy variables of the same independent variable, certain dummy variables occur significance,statistical, Mou Xiewu In the case where significance,statistical, in order to guarantee that all dummy variables represent the correctness of meaning, it should be included in a model all Dummy variable.So we need to select Enter to be forced into method, to guarantee all mute changes when introducing dummy variable enters model Amount can be retained in last model.
Table 7 is model analysis parameter comparison.Model coefficient Omnibus significance test: Omnibus value is respectively less than 0.05, Illustrate that the linear relationship between all independents variable and Logit P is significant, it is reasonable using this model;Degree of fitting is examined: Nagelkerke R2Statistic, value range are 0~1, illustrate that regression equation degree of fitting is higher closer to 1, find by contrast Model-fitting degree is established higher than continuous variable model with dummy variable;Hosmer-Lemeshow is the test of fitness of fot for commenting The calibration degree of valence prediction model is the important indicator of evaluation windowing model prediction window state probability accuracy, it reflects mould The consistent degree that type is predicted and actually occurred, so may also be referred to as consistency.Calibration degree is the accuracy of prediction model well It is high.If inspection result shows significance,statistical (P < 0.05), show exist between model predication value and actual observed value Certain difference, model calibration degree is poor, otherwise it is higher to work as the representative model calibration degree of P > 0.05.The dummy variable mould known to comparing result The calibration degree of type is 0.122 much higher than continuous variable model, i.e. dummy variable prediction model accuracy is higher than continuous variable model; AUC is the area under ROC curve, the index for valuation prediction models separating capacity.Prediction model is by being arranged certain boundary Value, is judged as windowing higher than dividing value, is then judged as pass window lower than dividing value, to correctly distinguish the state of window, here it is predictions The discrimination of windowing model.AUC is bigger, and the differentiation separating capacity for illustrating prediction model is better, and discrimination is thought in general AUC < 0.6 Poor, 0.6-0.75 thinks that model has certain separating capacity, and AUC > 0.75 thinks that separating capacity is preferable.It can by comparing result Know, dummy variable model AUC is 0.81 separating capacity preferably and is higher than continuous variable model.By comparison, it was found that setting dummy variable Model prediction windowing behavior accuracy is 74.8% higher than continuous variable model.Generally speaking, either model-fitting degree is still Continuous variable model is superior to using the Logic Regression Models that dummy variable is established for prediction accuracy.
The comparison of 7 model parameter of table
Method provided by the invention based on the prediction windowing behavior of Binary Logistic model use dummy variable, passes through Dummy variable is set and represents difference between several ranks of independent variable to windowing influence degree, is made often by constructing regression model One dummy variable can obtain a regression coefficient, can more detailed explanation difference independent variable under different level to windowing go For influence degree, explain windowing behavior with more practical relative to the corresponding regression coefficient of independent variable in forefathers' research Meaning.
Used herein a specific example illustrates the principle and implementation of the invention, and above embodiments are said It is bright to be merely used to help understand method and its core concept of the invention;At the same time, for those skilled in the art, foundation Thought of the invention, there will be changes in the specific implementation manner and application range.In conclusion the content of the present specification is not It is interpreted as limitation of the present invention.

Claims (5)

1. a kind of method based on the prediction windowing behavior of Binary Logistic model use dummy variable, which is characterized in that described Method the following steps are included:
S1, multivariate analysis of variance, multiple variables inside and outside collection room, by each variable according to specific measured data distribution into Row different level divides, and each horizontal division case balanced number, meanwhile, after judging that each variable carries out different level division Classified variable whether Normal Distribution, variance analysis is carried out to the classified variable of Normal Distribution, examines system according to Wald Probability Sig value corresponding to variable is counted, if Sig value less than 0.05, shows that the variable has a significant impact windowing behavior, from And filter out independent variable relevant to windowing behavior;
S2, collinearity diagnostics carry out collinearity diagnostics to the multiple independents variable relevant to windowing behavior filtered out, according to setting Synteny decision condition, filter out the low independent variable of synteny;
S3, dualistic logistic regression model foundation and windowing behavior prediction, are arranged mute based on the independent variable that step S2 is filtered out Variable establishes dualistic logistic regression model, and the logistic regression coefficient of each dummy variable is obtained according to the independent variable, by the logic Regression coefficient substitutes into dualistic logistic regression equation, obtains windowing behavior prediction value.
2. the method according to claim 1 based on the prediction windowing behavior of Binary Logistic model use dummy variable, It is characterized in that, the synteny decision condition according to setting in the step S2, filters out the low independent variable of synteny, specifically It include: to meet the independent variable of synteny decision condition of setting to be screened out, the as low independent variable of synteny, wherein institute Stating synteny decision condition is the variance inflation factor that 0<VIF<5 and TOL>0.2, VIF indicates multiple independents variable, and TOL indicates more The feasible value of a independent variable.
3. the method according to claim 1 based on the prediction windowing behavior of Binary Logistic model use dummy variable, It is characterized in that, multiple variables inside and outside collection room specifically include in the step S1: acquisition room temperature, interior are relatively wet Degree, interior CO2 concentration, interior PM2.5 concentration, outdoor temperature, outside relative humidity, outdoor PM2.5 concentration, outdoor wind speed, room Outer wind direction, indoor/outdoor temperature-difference, indoor and outdoor relative humidity differential, indoor and outdoor PM2.5 concentration difference, solar radiation and time.
4. the method according to claim 3 based on the prediction windowing behavior of Binary Logistic model use dummy variable, It is characterized in that, filtering out the low independent variable of synteny in the step S2, specifically include: the low independent variable packet of the synteny Include room temperature, indoor relative humidity, interior CO2Concentration, interior PM2.5 concentration, outdoor wind speed, outdoor wind direction, indoor and outdoor temperature Difference, indoor and outdoor relative humidity differential, indoor and outdoor PM2.5 concentration difference, solar radiation and time.
5. the method according to claim 4 based on the prediction windowing behavior of Binary Logistic model use dummy variable, It is characterized in that, establishing dualistic logistic regression model in the step S3, it is arranged based on the independent variable that step S2 is filtered out Dummy variable obtains the logistic regression coefficient of each dummy variable according to the independent variable, and logistic regression coefficient substitution binary is patrolled It collects in regression equation, obtains windowing behavior prediction value, specifically include:
Dualistic logistic regression equation indicates are as follows:
In formula, P value is cut off value with 0.5, when P≤0.5 represents window as open state, when P < 0.5 represents window as closing shape State;
Z=B(i)Time(i)+B(k)RH(k)+B(k)CO2(k)+B(k)Φ(k)+B(k)ΔT(k)+B(k)ΔRH(k)+B(k)ΔPM2.5(k)+B(m) PM2.5(m)+B(m)V(m)+B(n)T(n)+B(n)R(n)-3.765;
Wherein, B indicates the regression coefficient of each independent variable, and subscript represents each independent variable different level dummy variable, when Time is indicated Between, RH indicates indoor relative humidity, CO2Indicate interior CO2 concentration, Φ indicates that outdoor wind direction, Δ Τ indicate indoor/outdoor temperature-difference, Δ RH indicates that indoor and outdoor relative humidity differential, Δ PM2.5 indicate indoor and outdoor PM2.5 concentration difference, and PM2.5 indicates interior PM2.5 concentration, V Indicate that outdoor wind speed, T indicate room temperature, R indicates solar radiation.
CN201910776641.9A 2019-08-22 2019-08-22 Method based on the prediction windowing behavior of Binary Logistic model use dummy variable Pending CN110472197A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910776641.9A CN110472197A (en) 2019-08-22 2019-08-22 Method based on the prediction windowing behavior of Binary Logistic model use dummy variable

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910776641.9A CN110472197A (en) 2019-08-22 2019-08-22 Method based on the prediction windowing behavior of Binary Logistic model use dummy variable

Publications (1)

Publication Number Publication Date
CN110472197A true CN110472197A (en) 2019-11-19

Family

ID=68512729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910776641.9A Pending CN110472197A (en) 2019-08-22 2019-08-22 Method based on the prediction windowing behavior of Binary Logistic model use dummy variable

Country Status (1)

Country Link
CN (1) CN110472197A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507420A (en) * 2020-11-19 2021-03-16 同济大学 System for constructing personal personalized environment control behavior prediction model training set in office building
CN115907239A (en) * 2023-03-08 2023-04-04 联通(山东)产业互联网有限公司 Method for predicting global annual average temperature
CN116881652A (en) * 2023-06-26 2023-10-13 成都理工大学 Landslide vulnerability evaluation method based on optimal negative sample and random forest model
CN117540346A (en) * 2024-01-09 2024-02-09 四川国蓝中天环境科技集团有限公司 Order class variable redundancy removing method for high-dimensional regression modeling of atmospheric pollution data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899437A (en) * 2015-05-29 2015-09-09 杭州辰青和业科技有限公司 Early-warning method for heavy-rainfall type landslide hazard
CN106251241A (en) * 2016-08-02 2016-12-21 贵州电网有限责任公司信息中心 A kind of feature based selects the LR Bagging algorithm improved
CN108182339A (en) * 2018-03-20 2018-06-19 北京工业大学 A kind of window trend prediction method and system based on Gaussian Profile

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899437A (en) * 2015-05-29 2015-09-09 杭州辰青和业科技有限公司 Early-warning method for heavy-rainfall type landslide hazard
CN106251241A (en) * 2016-08-02 2016-12-21 贵州电网有限责任公司信息中心 A kind of feature based selects the LR Bagging algorithm improved
CN108182339A (en) * 2018-03-20 2018-06-19 北京工业大学 A kind of window trend prediction method and system based on Gaussian Profile

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孙巧等: "基于哑变量分析的龙驹坝地区崩塌灾害易发性评价", 《水文地质工程地质》 *
张文业: "室内空气品质与孕妇呼吸系统过敏性疾病的关系研究 ——以北京地区为例", 《工程科技Ⅰ辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507420A (en) * 2020-11-19 2021-03-16 同济大学 System for constructing personal personalized environment control behavior prediction model training set in office building
CN115907239A (en) * 2023-03-08 2023-04-04 联通(山东)产业互联网有限公司 Method for predicting global annual average temperature
CN115907239B (en) * 2023-03-08 2023-05-09 联通(山东)产业互联网有限公司 Method for predicting global annual average air temperature
CN116881652A (en) * 2023-06-26 2023-10-13 成都理工大学 Landslide vulnerability evaluation method based on optimal negative sample and random forest model
CN116881652B (en) * 2023-06-26 2024-04-05 成都理工大学 Landslide vulnerability evaluation method based on optimal negative sample and random forest model
CN117540346A (en) * 2024-01-09 2024-02-09 四川国蓝中天环境科技集团有限公司 Order class variable redundancy removing method for high-dimensional regression modeling of atmospheric pollution data
CN117540346B (en) * 2024-01-09 2024-03-19 四川国蓝中天环境科技集团有限公司 Order class variable redundancy removing method for high-dimensional regression modeling of atmospheric pollution data

Similar Documents

Publication Publication Date Title
CN110472197A (en) Method based on the prediction windowing behavior of Binary Logistic model use dummy variable
CN107480341B (en) A kind of dam safety comprehensive method based on deep learning
CN112069567A (en) Method for predicting compressive strength of concrete based on random forest and intelligent algorithm
CN105629198B (en) The indoor multi-target tracking method of fast search clustering algorithm based on density
CN110929918B (en) 10kV feeder fault prediction method based on CNN and LightGBM
CN112465243B (en) Air quality forecasting method and system
CN112819293A (en) Failure early warning analysis method for water reservoir scheduling rule under influence of climate change
CN103166830B (en) A kind of Spam Filtering System of intelligent selection training sample and method
CN111861274A (en) Water environment risk prediction and early warning method
CN105069537B (en) A kind of construction method of combination type air Quality Prediction
CN109408848B (en) Distributed attribution method considering runoff evolution space-time heterogeneity
Sugiarto et al. Data classification for air quality on wireless sensor network monitoring system using decision tree algorithm
CN106897826A (en) A kind of street accidents risks appraisal procedure and system
CN109405195A (en) Air conditioner intelligent control system and method
CN111081016A (en) Urban traffic abnormity identification method based on complex network theory
CN102902985A (en) Coastal water quality evaluation method based on two-classification support vector machines and particle swarm algorithm
CN105005825B (en) Method and system for predicting photovoltaic power based on dynamic neural network
CN113344470B (en) Intelligent power supply system management method based on block chain
CN108064047A (en) A kind of water quality sensor network optimization dispositions method based on population
CN103020591A (en) Medium scale crowd abnormal behavior detection method based on causal network analysis
CN112966871A (en) Traffic jam prediction method and system based on convolution long-short term memory neural network
Vinothini et al. IoT based flood detection and notification system using decision tree algorithm
CN110751176A (en) Lake water quality prediction method based on decision tree algorithm
CN113936413A (en) Early fire monitoring and early warning method and device
CN116739619A (en) Energy power carbon emission monitoring analysis modeling method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191119