CN110472197A - Method based on the prediction windowing behavior of Binary Logistic model use dummy variable - Google Patents
Method based on the prediction windowing behavior of Binary Logistic model use dummy variable Download PDFInfo
- Publication number
- CN110472197A CN110472197A CN201910776641.9A CN201910776641A CN110472197A CN 110472197 A CN110472197 A CN 110472197A CN 201910776641 A CN201910776641 A CN 201910776641A CN 110472197 A CN110472197 A CN 110472197A
- Authority
- CN
- China
- Prior art keywords
- variable
- outdoor
- indoor
- prediction
- windowing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000007477 logistic regression Methods 0.000 claims abstract description 38
- 238000000491 multivariate analysis Methods 0.000 claims abstract description 5
- 230000003455 independent Effects 0.000 claims description 12
- 230000005855 radiation Effects 0.000 claims description 11
- 238000001914 filtration Methods 0.000 claims 1
- 238000006467 substitution reaction Methods 0.000 claims 1
- CURLTUGMZLYLDI-UHFFFAOYSA-N Carbon dioxide Chemical compound O=C=O CURLTUGMZLYLDI-UHFFFAOYSA-N 0.000 description 16
- 229910002092 carbon dioxide Inorganic materials 0.000 description 12
- 230000001419 dependent effect Effects 0.000 description 9
- 238000011160 research Methods 0.000 description 5
- 230000003542 behavioural effect Effects 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000007689 inspection Methods 0.000 description 3
- 239000001569 carbon dioxide Substances 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000001772 Wald test Methods 0.000 description 1
- 238000000540 analysis of variance Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 238000009415 formwork Methods 0.000 description 1
- 239000013618 particulate matter Substances 0.000 description 1
- 238000000611 regression analysis Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Strategic Management (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Operations Research (AREA)
- Human Resources & Organizations (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Economics (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Evolutionary Biology (AREA)
- Marketing (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Algebra (AREA)
- Development Economics (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a kind of methods based on the prediction windowing behavior of Binary Logistic model use dummy variable, this method comprises: multivariate analysis of variance, filters out independent variable relevant to windowing behavior;Collinearity diagnostics filter out the low independent variable of synteny;Dualistic logistic regression model prediction windowing behavior is established by the way that dummy variable is arranged.Method provided by the invention based on the prediction windowing behavior of Binary Logistic model use dummy variable realizes application of the dummy variable in prediction windowing behavior dualistic logistic regression model, improves the precision of prediction of model.
Description
Technical field
The present invention relates to windowing behavior prediction technical fields, are based on Binary Logistic model more particularly to one kind
With the method for dummy variable prediction windowing behavior.
Background technique
Windowing behavior is one two classification (0 represents pass window, and 1 represents windowing) problem, therefore at present both at home and abroad to windowing row
To use more algorithm for Binary Logistic (dualistic logistic regression) in research.Wherein dependent variable is the state of window
To open a window or closing window, independent variable includes environmental factor (indoor and outdoor temperature and humidity, PM2.5 concentration, interior CO2 concentration, wind speed, wind direction
And solar radiation etc.).The form for generalling use continuous variable to independent variable in existing research achievement carries out modeling and forecasting and opens
Window behavior, the corresponding regression coefficient of each independent variable explain the switch state of window.Current research is all directly will be continuous
Property variable is brought into regression model, such as when outside relative humidity brings model into continuous variable, is construed to relatively wet
Every increase by 1% of influence when spending to(for) dependent variable (windowing behavior), but it is very micro- that often humidity, which increases by 1% its influence to windowing,
It is weak that there is no too big practical significances.
Dummy variable, also known as dummy variable or nominal variable are artificial illusory variables, and usual value is 0 or 1 to reflect
The different attribute of some variable, wherein " 1 " represents "Yes", " 0 " represents "No".For there is the independent variable of n categorical attribute, usually
It needs to choose 1 classification as reference classification, therefore can produce n-1 dummy variable.Minimum or highest can be chosen referring to variable
The dummy variable of horizontal classification and other different levels, which compares, observes it to windowing behavioral implications situation.At this point, we can
This continuous variable of outside relative humidity is carried out discretization, division is carried out for an interval according to 15% and is changed into mute change
Amount, such as less than 20%, 20%-35%, 35%-50%, 50%-65%, 65%-80%, be greater than 80%, model is constructed at this time
Influence when regression coefficient can be construed to 15% multiple of outside relative humidity every change to dependent variable window state.Its
In each dummy variable represent to the difference of windowing influence degree between several ranks of relative humidity, by constructing regression model, often
One dummy variable can obtain the regression coefficient of an estimation, so that the result returned is easier to explain, with more practical
Meaning.Similarly, the parameters such as indoor carbon dioxide concentration, indoor and outdoor temperature, PM2.5 concentration are built after can also classifying by different level
Mould, can more detailed explanation difference independent variable under different level to windowing behavior influence degree, relative to forefathers study
In the windowing behavior that describes of explaining that this way is more careful for windowing behavior of the corresponding regression coefficient of independent variable mention
The high precision of prediction of model.
Summary of the invention
The object of the present invention is to provide a kind of based on the prediction windowing behavior of Binary Logistic model use dummy variable
Method realizes application of the dummy variable in prediction windowing behavior dualistic logistic regression model, improves the precision of prediction of model.
To achieve the above object, the present invention provides following schemes:
A method of windowing behavior is predicted based on Binary Logistic model use dummy variable, the method includes
Following steps:
Each variable is distributed model according to specific measured data by S1, multivariate analysis of variance, multiple variables inside and outside collection room
It encloses and carries out different level division, and each horizontal division case balanced number, meanwhile, judge that each variable carries out different level division
Classified variable afterwards whether Normal Distribution, variance analysis is carried out to the classified variable of Normal Distribution, is examined according to Wald
Probability Sig value corresponding to statistical variable is tested, if Sig value less than 0.05, shows that the variable has significant shadow to windowing behavior
It rings, to filter out independent variable relevant to windowing behavior;
S2, collinearity diagnostics carry out collinearity diagnostics to the multiple independents variable relevant to windowing behavior filtered out, according to
The synteny decision condition of setting filters out the low independent variable of synteny;
S3, dualistic logistic regression model foundation and windowing behavior prediction, are set based on the independent variable that step S2 is filtered out
Dummy variable is set, dualistic logistic regression model is established, the logistic regression coefficient of each dummy variable is obtained according to the independent variable, it will be described
Logistic regression coefficient substitutes into dualistic logistic regression equation, obtains windowing behavior prediction value.
Optionally, the synteny decision condition according to setting in the step S2, filters out the low independent variable of synteny,
Specifically include: the independent variable for meeting the synteny decision condition of setting is screened out, as the low independent variable of synteny,
In, the synteny decision condition is the variance inflation factor that 0<VIF<5 and TOL>0.2, VIF indicates multiple independents variable, TOL table
Show the feasible value of multiple independents variable.
Optionally, in the step S1, multiple variables inside and outside collection room are specifically included: acquisition room temperature, interior are opposite
Humidity, interior CO2 concentration, interior PM2.5 concentration, outdoor temperature, outside relative humidity, outdoor PM2.5 concentration, outdoor wind speed,
Outdoor wind direction, indoor/outdoor temperature-difference, indoor and outdoor relative humidity differential, indoor and outdoor PM2.5 concentration difference, solar radiation and time.
Optionally, filter out the low independent variable of synteny in the step S2, specifically include: the synteny is low to be become certainly
Amount includes room temperature, indoor relative humidity, interior CO2Concentration, interior PM2.5 concentration, outdoor wind speed, outdoor wind direction, interior
The outer temperature difference, indoor and outdoor relative humidity differential, indoor and outdoor PM2.5 concentration difference, solar radiation and time.
Optionally, in the step S3, dualistic logistic regression model is established, based on the independent variable that step S2 is filtered out
Dummy variable is set, the logistic regression coefficient of each dummy variable is obtained according to the independent variable, the logistic regression coefficient is substituted into two
In metalogic regression equation, obtains windowing behavior prediction value, specifically includes:
Dualistic logistic regression equation indicates are as follows:
In formula, P value is cut off value with 0.5, when P≤0.5 represents window as open state, when P < 0.5 represents window as pass
Closed state;
Z=B(i)Time(i)+B(k)RH(k)+B(k)CO2(k)+B(k)Φ(k)+B(k)ΔT(k)+B(k)ΔRH(k)
+B(k)ΔPM2.5(k)+B(m)PM2.5(m)+B(m)V(m)+B(n)T(n)+B(n)R(n)-3.765;
Wherein, B indicates the regression coefficient of each independent variable, and subscript represents each independent variable different level dummy variable, Time table
Show the time, RH indicates indoor relative humidity, CO2Indicate interior CO2 concentration, Φ indicates that outdoor wind direction, Δ Τ indicate indoor and outdoor temperature
Difference, Δ RH indicate that indoor and outdoor relative humidity differential, Δ PM2.5 indicate indoor and outdoor PM2.5 concentration difference, and PM2.5 indicates that interior PM2.5 is dense
Degree, V indicate that outdoor wind speed, T indicate room temperature, and R indicates solar radiation.
The specific embodiment provided according to the present invention, the invention discloses following technical effects: provided by the invention to be based on
The method that Binary Logistic model use dummy variable predicts windowing behavior, if representing independent variable by the way that dummy variable is arranged
To the difference of windowing influence degree between dry rank, make each dummy variable that can show that one is returned by constructing regression model
Return coefficient, can more detailed explanation difference independent variable under different level to windowing behavior influence degree, relative to forefathers
The corresponding regression coefficient of an independent variable explains that windowing behavior is more of practical significance in research.
Detailed description of the invention
It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment
Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention
Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings
Obtain other attached drawings.
Fig. 1 is the side for predicting windowing behavior in the embodiment of the present invention based on Binary Logistic model use dummy variable
Method flow chart;
Fig. 2 is room temperature different level distribution situation and normal curve in the embodiment of the present invention;
Fig. 3 is indoor relative humidity different level distribution situation and normal curve in the embodiment of the present invention;
Fig. 4 is interior CO2 concentration different level distribution situation and normal curve in the embodiment of the present invention;
Fig. 5 is interior PM2.5 concentration different level distribution situation and normal curve in the embodiment of the present invention;
Fig. 6 is outdoor temperature different level distribution situation and normal curve in the embodiment of the present invention;
Fig. 7 is outside relative humidity different level distribution situation and normal curve in the embodiment of the present invention;
Fig. 8 is outdoor PM2.5 concentration room temperature different level distribution situation and normal curve in the embodiment of the present invention;
Fig. 9 is solar radiation different level distribution situation and normal curve in the embodiment of the present invention;
Figure 10 is outdoor wind speed different level distribution situation and normal curve in the embodiment of the present invention;
Figure 11 is outdoor wind direction different level distribution situation and normal curve in the embodiment of the present invention;
Figure 12 is indoor/outdoor temperature-difference different level distribution situation and normal curve in the embodiment of the present invention;
Figure 13 is indoor and outdoor relative humidity differential different level distribution situation and normal curve in the embodiment of the present invention;
Figure 14 is indoor and outdoor PM2.5 concentration difference different level distribution situation and normal curve in the embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
The object of the present invention is to provide a kind of based on the prediction windowing behavior of Binary Logistic model use dummy variable
Method realizes application of the dummy variable in prediction windowing behavior dualistic logistic regression model, improves the precision of prediction of model.
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real
Applying mode, the present invention is described in further detail.
Fig. 1 is the side for predicting windowing behavior in the embodiment of the present invention based on Binary Logistic model use dummy variable
Method flow chart, as shown in Figure 1, provided by the invention predict windowing behavior based on Binary Logistic model use dummy variable
Method the following steps are included:
Each variable is distributed model according to specific measured data by S1, multivariate analysis of variance, multiple variables inside and outside collection room
It encloses and carries out different level division, and each horizontal division case balanced number, meanwhile, judge that each variable carries out different level division
Classified variable afterwards whether Normal Distribution, variance analysis is carried out to the classified variable of Normal Distribution, is examined according to Wald
Probability Sig value corresponding to statistical variable is tested, if Sig value less than 0.05, shows that the variable has significant shadow to windowing behavior
It rings, to filter out independent variable relevant to windowing behavior;
S2, collinearity diagnostics carry out collinearity diagnostics to the multiple independents variable relevant to windowing behavior filtered out, according to
The synteny decision condition of setting filters out the low independent variable of synteny;
S3, dualistic logistic regression model foundation and windowing behavior prediction, are set based on the independent variable that step S2 is filtered out
Dummy variable is set, dualistic logistic regression model is established, the logistic regression coefficient of each dummy variable is obtained according to the independent variable, it will be described
Logistic regression coefficient substitutes into dualistic logistic regression equation, obtains windowing behavior prediction value.
Wherein, in the step S1, multiple variables inside and outside collection room are specifically included: acquisition room temperature, interior are relatively wet
Degree, interior CO2 concentration, interior PM2.5 concentration, outdoor temperature, outside relative humidity, outdoor PM2.5 concentration, outdoor wind speed, room
Outer wind direction, indoor/outdoor temperature-difference, indoor and outdoor relative humidity differential, indoor and outdoor PM2.5 concentration difference, solar radiation and time;
Then each variable is subjected to different level division according to specific measured data distribution, in order to guarantee that the later period builds
The situation that the stability of mould should avoid certain level case number excessive or very few when dividing, it is ensured that each horizontal division case
In addition to this balanced number needs to guarantee that the classified variable Normal Distribution after independent variable progress different level division can just carry out
Variance analysis filters out variable relevant to dependent variable (window state), divides details and is shown in Table 1.
The windowing behavioral implications factor level distribution of table 1
Whether meet normal state point by the different each horizontal distribution situations of variable known to each variable histogram and the variable
Cloth, each variable, which is all satisfied normal distribution after division is horizontal, known to Fig. 2-Figure 14 can directly carry out multifactor analysis of variance screening
Out to windowing the significant relevant independent variable of behavior.
Multivariate analysis of variance is carried out to the variation of Normal Distribution, open a window behavioral implications variable as shown in Table 2
Effect verification result is that all variable conspicuousnesses are respectively less than 0.05 between main body, according to probability corresponding to Wald inspection statistics variable
Sig value, if Sig value less than 0.05, shows that the variable has a significant impact windowing behavior, each variable is to windowing in table 2
Behavior has a significant impact, and can be selected into model there are correlation with windowing behavior and make independent variable.
Effect is examined and determine between the windowing behavioral implications variable main body of table 2
Dependent variable: window state
Wherein, the synteny decision condition according to setting in the step S2 filters out the low independent variable of synteny, tool
Body includes: that the independent variable of the synteny decision condition of satisfaction setting is screened out, as the low independent variable of synteny, wherein
The synteny decision condition is the variance inflation factor that 0<VIF<5 and TOL>0.2, VIF indicates multiple independents variable, and TOL is indicated
The feasible value of multiple independents variable.Filter out the low independent variable of synteny in the step S2, specifically include: the synteny is low
Independent variable includes room temperature, indoor relative humidity, interior CO2Concentration, interior PM2.5 concentration, outdoor wind speed, outdoor wind direction,
Indoor/outdoor temperature-difference, indoor and outdoor relative humidity differential, indoor and outdoor PM2.5 concentration difference, solar radiation and time.
Multiple variables are highly relevant in modeling process will cause the unstable of model, it is therefore necessary to carry out between variable
Collinearity diagnostics.Variance inflation factor (VIF): refer between explanatory variable there are when multicollinearity variance with there is no more
The ratio between variance when weight synteny.Feasible value (TOL) is the inverse of VIF, and VIF is bigger, and display synteny is more serious.Micro-judgment
Method shows when there is no multicollinearities by 0<VIF<5 and TOL>0.2.Through inspection discovery outdoor temperature, outside relative humidity,
Outdoor PM2.5 concentration and the higher appearance of indoor/outdoor temperature-difference, indoor and outdoor relative humidity, indoor and outdoor PM2.5 concentration difference VIF value are conllinear
Property, it is contemplated that indoor and outdoor difference can reflect therefore the indoor comprehensive condition with outdoor rejects outdoor temperature, relative humidity simultaneously
And three variables of PM2.5 concentration largely reduce multicollinearity phenomenon.Synteny is examined after screening as shown in Table 3
As a result 5 and TOL value is respectively less than for each variable V IF value and is all larger than 0.2, between the independent variable for influencing cubicle building windowing behavior
Synteny is lower, and model is relatively stable.
3 test for multi-collinearity of table
Dependent variable: window state
Wherein, dummy variable is set by the step S2 independent variable filtered out, illustrates herein and sets outside relative humidity
The method for being set to dummy variable, firstly, outside relative humidity interval 15% is divided into 6 levels, respectively less than 20%,
20%-35%, 35%-50%, 50%-65%, 65%-80%, it is greater than 80%.Choose outside relative humidity " less than 20% " this
Remaining 5 different level classification settings are dummy variable as reference classification by one kind." 0 ", " 1 " coding is utilized to redefine 5
A dummy variable is level 1, level 2, level 3, level 4, level 5, wherein " 1 " represents "Yes", " 0 " represents "No", sorting code number
List is as shown in table 4.When all horizontal classifications coding be 0 represent this RH range for less than 20%, belong to referring to change
Amount;When " level 1 " is encoded to 1, other level codes are 0 to represent RH range as 20%-35%;When " level 2 " is compiled
Code is 1, other level codes are 0 to represent RH range as 35%-50%;When " level 3 " is encoded to 1, other are horizontal
Coding is 0 to represent RH range as 50%-65%;When " level 4 " is encoded to 1, other level codes are 0 to represent phase
It is 65%-80% to humidity range;When " level 5 " is encoded to 1, other level codes are 0 to represent relative humidity distribution
Greater than 80%.
4 outside relative humidity sorting code number list of table
The each dummy variable of outside relative humidity for constructing model at this time can be construed to the every change of outside relative humidity
" 15% multiple " and relative humidity " less than 20% " compares its influence degree to dependent variable (window state).
Meanwhile the different level dummy variable list after all independents variable recodifications is as shown in table 5.5 classified variable of table is again
Dummy variable coding after definition
Dualistic logistic regression model can be established after dummy variable by, which setting up, obtains each mute change of independent variable different level in equation
Corresponding logistic regression coefficient is measured, and then obtains the logistic regression equation of prediction windowing behavior, eventually by " model-fitting degree is examined
Test " and the model established of the model established with dummy variable of " windowing behavior prediction accuracy rate " comparison and continuous variable which
It is more accurate.
Windowing behavior is one two classification (0 represents pass window, and 1 represents windowing) problem, is ground both at home and abroad to windowing behavior at present
Study carefully and middle uses more algorithm for Binary Logistic (dualistic logistic regression).Wherein window state is dependent variable, is influenced out
The factor of window behavior is independent variable.
Assume that dependent variable is y in Binary Logistic model, value indicates that event, value are when being " 1 "
Indicate that event does not occur when " 0 ", n independent variable for influencing y is x1, x2, x3..., xn, the probability that P expression event occurs, 1-P table
Show that the probability that the time do not occur, Binary Logistic model are as follows:
For the linear model for seeking Binary Logistic regression model, the generation ratio of event is acquired first, i.e. event occurs
With the ratio between the probability that does not occur P/ (1-P), Logarithm conversion is then carried out to it can obtain Binary Logistic regression model
Linear model it is as follows:
It can be seen that the independent variable that the regression coefficient of Binary Logistic equation can be understood as a unit becomes
The change value of the logarithm of ratio occurs caused by change.Due to understanding that " generation " is easy than the logarithm for understanding " occur than ", so general
Equation can be written as follow form:
β0Constant is represented, when other explanatory variables remain unchanged, and i-th of independent variable changes a unit, ratio occurs
Changing value is exp (βi).The regression coefficient β of independent variableiFor positive value, indicate that the probability that event occurs increases, exp (βi)>1;From change
The regression coefficient β of amountiFor negative value, indicate that the probability that event occurs reduces, exp (βi)<1;The regression coefficient β of independent variableiIt is 0,
exp(βi)=1 indicates to be influenced with the presence or absence of generation than all will not change i.e. probability regardless of the independent variable;
It enables in order to facilitate understandingSo logistic regression equation (1) formula is available
Following form expression:
P value is cut off value with 0.5 in above formula, when P≤0.5 represents window as open state, when P < 0.5 represents window as pass
Closed state.Establish that dualistic logistic regression modal analysis results are as shown in table 6, wherein " B " value indicates the regression coefficient of each variable, under
Footmark represents each variable different level dummy variable, it can thus be concluded that equation out:
Z=B(i)Time(i)+B(k)RH(k)+B(k)CO2(k)+B(k)Φ(k)+B(k)ΔT(k)+B(k)ΔRH(k)
+B(k)ΔPM2.5(k)+B(m)PM2.5(m)+B(m)V(m)+B(n)T(n)+B(n)R(n)-3.765
(6)
In formula: Time --- time, i ∈ [1,11];
RH --- indoor relative humidity, k ∈ [1,5];
CO2--- indoor carbon dioxide concentration, k ∈ [1,5];
Φ --- wind direction, k ∈ [1,5];
PM2.5 --- indoor PM2.5, k ∈ [1,5];
Δ Τ --- indoor/outdoor temperature-difference, k ∈ [1,5];
Δ RH --- indoor and outdoor relative humidity differential, k ∈ [1,5];
Δ PM2.5 --- indoor and outdoor PM2.5 concentration difference, m ∈ [Isosorbide-5-Nitrae];
V --- wind speed, m ∈ [Isosorbide-5-Nitrae];
T --- room temperature, n ∈ [1,3];
R --- solar radiation, n ∈ [1,3].
Finally show that the dummy variable parameter under each independent variable different level is as shown in table 6 by logistic regression analysis.In table
Logistic regression coefficient " B " and " EXP (B) " be logarithmic relationship, will " B " progress Logarithm conversion after obtain " EXP (B) "."EXP
(B) " value be used to explain different level dummy variable with referring to classification compare its to windowing influence degree how.Such as interior PM2.5
Concentration is divided into four levels, and it is 1-35ug/m3 that " level 1 ", which is interior PM2.5 concentration distribution range, as shown in Table 1, by " water
For " indoor pm2.5 concentration " in flat 1 " i.e. table 6 as referring to classification, be left three levels is " indoor pm2.5 in table 6 respectively
Concentration (1) ", " indoor pm2.5 concentration (2) ", " indoor pm2.5 concentration (3) " be three horizontal dummy variables respectively with reference
Classification compares.Corresponding EXP (B) value of as shown in table 6 interior three levels of PM2.5 concentration is respectively 1.676,2.289,
7.461, cause the probability to open a window to be referring to classification with referring to classification " indoor PM2.5 concentration " comparison " indoor pm2.5 concentration (1) "
1.676 times, " indoor pm2.5 concentration (2) " causes the probability of windowing to be " the indoor pm2.5 concentration referring to 2.289 times of classification
(3) " causing the probability to open a window is 7.461 times of reference classification, and three numerical value illustrate that indoor PM2.5 concentration is bigger, at window
It is bigger in the open state a possibility that, it may be interpreted as the windowing when outdoor pollution is more serious and introduced outdoor PM2.5 particulate matter
Cause indoor PM2.5 concentration higher.
The windowing behavior Multi-parameter binary logistic regression result of table 6
Usually judge whether an independent variable is significant in the model established, it can be according to Wald test statistics institute
Corresponding probability Sig value is examined.If in final result, the Sig value of each independent variable less than 0.05, then show independent variable to because
Variable has a significant impact and there are correlations.But it can be seen that the dummy variable Sig of individual independents variable in Regression Analysis Result table 6
Value is greater than 0.05, represents the horizontal dummy variable with comparing not statistically significant referring to classification and is directed to windowing behavior.It is building
Variance analysis is carried out to all independents variable in model before formwork erection type and finds that each independent variable has statistics with windowing behavior
It learns meaning (being shown in Table 2), so Sig value herein does not hinder being not statistically significant yet.It can thus be appreciated that establishing binary logic
When regression model, if the different dummy variables of the same independent variable, certain dummy variables occur significance,statistical, Mou Xiewu
In the case where significance,statistical, in order to guarantee that all dummy variables represent the correctness of meaning, it should be included in a model all
Dummy variable.So we need to select Enter to be forced into method, to guarantee all mute changes when introducing dummy variable enters model
Amount can be retained in last model.
Table 7 is model analysis parameter comparison.Model coefficient Omnibus significance test: Omnibus value is respectively less than 0.05,
Illustrate that the linear relationship between all independents variable and Logit P is significant, it is reasonable using this model;Degree of fitting is examined:
Nagelkerke R2Statistic, value range are 0~1, illustrate that regression equation degree of fitting is higher closer to 1, find by contrast
Model-fitting degree is established higher than continuous variable model with dummy variable;Hosmer-Lemeshow is the test of fitness of fot for commenting
The calibration degree of valence prediction model is the important indicator of evaluation windowing model prediction window state probability accuracy, it reflects mould
The consistent degree that type is predicted and actually occurred, so may also be referred to as consistency.Calibration degree is the accuracy of prediction model well
It is high.If inspection result shows significance,statistical (P < 0.05), show exist between model predication value and actual observed value
Certain difference, model calibration degree is poor, otherwise it is higher to work as the representative model calibration degree of P > 0.05.The dummy variable mould known to comparing result
The calibration degree of type is 0.122 much higher than continuous variable model, i.e. dummy variable prediction model accuracy is higher than continuous variable model;
AUC is the area under ROC curve, the index for valuation prediction models separating capacity.Prediction model is by being arranged certain boundary
Value, is judged as windowing higher than dividing value, is then judged as pass window lower than dividing value, to correctly distinguish the state of window, here it is predictions
The discrimination of windowing model.AUC is bigger, and the differentiation separating capacity for illustrating prediction model is better, and discrimination is thought in general AUC < 0.6
Poor, 0.6-0.75 thinks that model has certain separating capacity, and AUC > 0.75 thinks that separating capacity is preferable.It can by comparing result
Know, dummy variable model AUC is 0.81 separating capacity preferably and is higher than continuous variable model.By comparison, it was found that setting dummy variable
Model prediction windowing behavior accuracy is 74.8% higher than continuous variable model.Generally speaking, either model-fitting degree is still
Continuous variable model is superior to using the Logic Regression Models that dummy variable is established for prediction accuracy.
The comparison of 7 model parameter of table
Method provided by the invention based on the prediction windowing behavior of Binary Logistic model use dummy variable, passes through
Dummy variable is set and represents difference between several ranks of independent variable to windowing influence degree, is made often by constructing regression model
One dummy variable can obtain a regression coefficient, can more detailed explanation difference independent variable under different level to windowing go
For influence degree, explain windowing behavior with more practical relative to the corresponding regression coefficient of independent variable in forefathers' research
Meaning.
Used herein a specific example illustrates the principle and implementation of the invention, and above embodiments are said
It is bright to be merely used to help understand method and its core concept of the invention;At the same time, for those skilled in the art, foundation
Thought of the invention, there will be changes in the specific implementation manner and application range.In conclusion the content of the present specification is not
It is interpreted as limitation of the present invention.
Claims (5)
1. a kind of method based on the prediction windowing behavior of Binary Logistic model use dummy variable, which is characterized in that described
Method the following steps are included:
S1, multivariate analysis of variance, multiple variables inside and outside collection room, by each variable according to specific measured data distribution into
Row different level divides, and each horizontal division case balanced number, meanwhile, after judging that each variable carries out different level division
Classified variable whether Normal Distribution, variance analysis is carried out to the classified variable of Normal Distribution, examines system according to Wald
Probability Sig value corresponding to variable is counted, if Sig value less than 0.05, shows that the variable has a significant impact windowing behavior, from
And filter out independent variable relevant to windowing behavior;
S2, collinearity diagnostics carry out collinearity diagnostics to the multiple independents variable relevant to windowing behavior filtered out, according to setting
Synteny decision condition, filter out the low independent variable of synteny;
S3, dualistic logistic regression model foundation and windowing behavior prediction, are arranged mute based on the independent variable that step S2 is filtered out
Variable establishes dualistic logistic regression model, and the logistic regression coefficient of each dummy variable is obtained according to the independent variable, by the logic
Regression coefficient substitutes into dualistic logistic regression equation, obtains windowing behavior prediction value.
2. the method according to claim 1 based on the prediction windowing behavior of Binary Logistic model use dummy variable,
It is characterized in that, the synteny decision condition according to setting in the step S2, filters out the low independent variable of synteny, specifically
It include: to meet the independent variable of synteny decision condition of setting to be screened out, the as low independent variable of synteny, wherein institute
Stating synteny decision condition is the variance inflation factor that 0<VIF<5 and TOL>0.2, VIF indicates multiple independents variable, and TOL indicates more
The feasible value of a independent variable.
3. the method according to claim 1 based on the prediction windowing behavior of Binary Logistic model use dummy variable,
It is characterized in that, multiple variables inside and outside collection room specifically include in the step S1: acquisition room temperature, interior are relatively wet
Degree, interior CO2 concentration, interior PM2.5 concentration, outdoor temperature, outside relative humidity, outdoor PM2.5 concentration, outdoor wind speed, room
Outer wind direction, indoor/outdoor temperature-difference, indoor and outdoor relative humidity differential, indoor and outdoor PM2.5 concentration difference, solar radiation and time.
4. the method according to claim 3 based on the prediction windowing behavior of Binary Logistic model use dummy variable,
It is characterized in that, filtering out the low independent variable of synteny in the step S2, specifically include: the low independent variable packet of the synteny
Include room temperature, indoor relative humidity, interior CO2Concentration, interior PM2.5 concentration, outdoor wind speed, outdoor wind direction, indoor and outdoor temperature
Difference, indoor and outdoor relative humidity differential, indoor and outdoor PM2.5 concentration difference, solar radiation and time.
5. the method according to claim 4 based on the prediction windowing behavior of Binary Logistic model use dummy variable,
It is characterized in that, establishing dualistic logistic regression model in the step S3, it is arranged based on the independent variable that step S2 is filtered out
Dummy variable obtains the logistic regression coefficient of each dummy variable according to the independent variable, and logistic regression coefficient substitution binary is patrolled
It collects in regression equation, obtains windowing behavior prediction value, specifically include:
Dualistic logistic regression equation indicates are as follows:
In formula, P value is cut off value with 0.5, when P≤0.5 represents window as open state, when P < 0.5 represents window as closing shape
State;
Z=B(i)Time(i)+B(k)RH(k)+B(k)CO2(k)+B(k)Φ(k)+B(k)ΔT(k)+B(k)ΔRH(k)+B(k)ΔPM2.5(k)+B(m)
PM2.5(m)+B(m)V(m)+B(n)T(n)+B(n)R(n)-3.765;
Wherein, B indicates the regression coefficient of each independent variable, and subscript represents each independent variable different level dummy variable, when Time is indicated
Between, RH indicates indoor relative humidity, CO2Indicate interior CO2 concentration, Φ indicates that outdoor wind direction, Δ Τ indicate indoor/outdoor temperature-difference, Δ
RH indicates that indoor and outdoor relative humidity differential, Δ PM2.5 indicate indoor and outdoor PM2.5 concentration difference, and PM2.5 indicates interior PM2.5 concentration, V
Indicate that outdoor wind speed, T indicate room temperature, R indicates solar radiation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910776641.9A CN110472197A (en) | 2019-08-22 | 2019-08-22 | Method based on the prediction windowing behavior of Binary Logistic model use dummy variable |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910776641.9A CN110472197A (en) | 2019-08-22 | 2019-08-22 | Method based on the prediction windowing behavior of Binary Logistic model use dummy variable |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110472197A true CN110472197A (en) | 2019-11-19 |
Family
ID=68512729
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910776641.9A Pending CN110472197A (en) | 2019-08-22 | 2019-08-22 | Method based on the prediction windowing behavior of Binary Logistic model use dummy variable |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110472197A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112507420A (en) * | 2020-11-19 | 2021-03-16 | 同济大学 | System for constructing personal personalized environment control behavior prediction model training set in office building |
CN115907239A (en) * | 2023-03-08 | 2023-04-04 | 联通(山东)产业互联网有限公司 | Method for predicting global annual average temperature |
CN116881652A (en) * | 2023-06-26 | 2023-10-13 | 成都理工大学 | Landslide vulnerability evaluation method based on optimal negative sample and random forest model |
CN117540346A (en) * | 2024-01-09 | 2024-02-09 | 四川国蓝中天环境科技集团有限公司 | Order class variable redundancy removing method for high-dimensional regression modeling of atmospheric pollution data |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104899437A (en) * | 2015-05-29 | 2015-09-09 | 杭州辰青和业科技有限公司 | Early-warning method for heavy-rainfall type landslide hazard |
CN106251241A (en) * | 2016-08-02 | 2016-12-21 | 贵州电网有限责任公司信息中心 | A kind of feature based selects the LR Bagging algorithm improved |
CN108182339A (en) * | 2018-03-20 | 2018-06-19 | 北京工业大学 | A kind of window trend prediction method and system based on Gaussian Profile |
-
2019
- 2019-08-22 CN CN201910776641.9A patent/CN110472197A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104899437A (en) * | 2015-05-29 | 2015-09-09 | 杭州辰青和业科技有限公司 | Early-warning method for heavy-rainfall type landslide hazard |
CN106251241A (en) * | 2016-08-02 | 2016-12-21 | 贵州电网有限责任公司信息中心 | A kind of feature based selects the LR Bagging algorithm improved |
CN108182339A (en) * | 2018-03-20 | 2018-06-19 | 北京工业大学 | A kind of window trend prediction method and system based on Gaussian Profile |
Non-Patent Citations (2)
Title |
---|
孙巧等: "基于哑变量分析的龙驹坝地区崩塌灾害易发性评价", 《水文地质工程地质》 * |
张文业: "室内空气品质与孕妇呼吸系统过敏性疾病的关系研究 ——以北京地区为例", 《工程科技Ⅰ辑》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112507420A (en) * | 2020-11-19 | 2021-03-16 | 同济大学 | System for constructing personal personalized environment control behavior prediction model training set in office building |
CN115907239A (en) * | 2023-03-08 | 2023-04-04 | 联通(山东)产业互联网有限公司 | Method for predicting global annual average temperature |
CN115907239B (en) * | 2023-03-08 | 2023-05-09 | 联通(山东)产业互联网有限公司 | Method for predicting global annual average air temperature |
CN116881652A (en) * | 2023-06-26 | 2023-10-13 | 成都理工大学 | Landslide vulnerability evaluation method based on optimal negative sample and random forest model |
CN116881652B (en) * | 2023-06-26 | 2024-04-05 | 成都理工大学 | Landslide vulnerability evaluation method based on optimal negative sample and random forest model |
CN117540346A (en) * | 2024-01-09 | 2024-02-09 | 四川国蓝中天环境科技集团有限公司 | Order class variable redundancy removing method for high-dimensional regression modeling of atmospheric pollution data |
CN117540346B (en) * | 2024-01-09 | 2024-03-19 | 四川国蓝中天环境科技集团有限公司 | Order class variable redundancy removing method for high-dimensional regression modeling of atmospheric pollution data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110472197A (en) | Method based on the prediction windowing behavior of Binary Logistic model use dummy variable | |
CN107480341B (en) | A kind of dam safety comprehensive method based on deep learning | |
CN112069567A (en) | Method for predicting compressive strength of concrete based on random forest and intelligent algorithm | |
CN105629198B (en) | The indoor multi-target tracking method of fast search clustering algorithm based on density | |
CN110929918B (en) | 10kV feeder fault prediction method based on CNN and LightGBM | |
CN112465243B (en) | Air quality forecasting method and system | |
CN112819293A (en) | Failure early warning analysis method for water reservoir scheduling rule under influence of climate change | |
CN103166830B (en) | A kind of Spam Filtering System of intelligent selection training sample and method | |
CN111861274A (en) | Water environment risk prediction and early warning method | |
CN105069537B (en) | A kind of construction method of combination type air Quality Prediction | |
CN109408848B (en) | Distributed attribution method considering runoff evolution space-time heterogeneity | |
Sugiarto et al. | Data classification for air quality on wireless sensor network monitoring system using decision tree algorithm | |
CN106897826A (en) | A kind of street accidents risks appraisal procedure and system | |
CN109405195A (en) | Air conditioner intelligent control system and method | |
CN111081016A (en) | Urban traffic abnormity identification method based on complex network theory | |
CN102902985A (en) | Coastal water quality evaluation method based on two-classification support vector machines and particle swarm algorithm | |
CN105005825B (en) | Method and system for predicting photovoltaic power based on dynamic neural network | |
CN113344470B (en) | Intelligent power supply system management method based on block chain | |
CN108064047A (en) | A kind of water quality sensor network optimization dispositions method based on population | |
CN103020591A (en) | Medium scale crowd abnormal behavior detection method based on causal network analysis | |
CN112966871A (en) | Traffic jam prediction method and system based on convolution long-short term memory neural network | |
Vinothini et al. | IoT based flood detection and notification system using decision tree algorithm | |
CN110751176A (en) | Lake water quality prediction method based on decision tree algorithm | |
CN113936413A (en) | Early fire monitoring and early warning method and device | |
CN116739619A (en) | Energy power carbon emission monitoring analysis modeling method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191119 |