CN110889536A - Method and system for predicting and early warning situation - Google Patents
Method and system for predicting and early warning situation Download PDFInfo
- Publication number
- CN110889536A CN110889536A CN201911035111.5A CN201911035111A CN110889536A CN 110889536 A CN110889536 A CN 110889536A CN 201911035111 A CN201911035111 A CN 201911035111A CN 110889536 A CN110889536 A CN 110889536A
- Authority
- CN
- China
- Prior art keywords
- prediction
- seasonal
- data
- parameters
- alarm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 17
- 230000001932 seasonal effect Effects 0.000 claims description 96
- 238000012549 training Methods 0.000 claims description 48
- 230000000694 effects Effects 0.000 claims description 36
- 241001123248 Arma Species 0.000 claims description 30
- 238000012360 testing method Methods 0.000 claims description 30
- 235000019580 granularity Nutrition 0.000 claims description 28
- 238000009499 grossing Methods 0.000 claims description 26
- 238000002372 labelling Methods 0.000 claims description 19
- 241000728173 Sarima Species 0.000 claims description 16
- 230000001934 delay Effects 0.000 claims description 12
- 238000003062 neural network model Methods 0.000 claims description 12
- 230000004069 differentiation Effects 0.000 claims description 10
- 238000007619 statistical method Methods 0.000 claims description 7
- 239000003086 colorant Substances 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 6
- 238000007689 inspection Methods 0.000 claims description 6
- 210000003743 erythrocyte Anatomy 0.000 claims description 4
- 230000009193 crawling Effects 0.000 abstract description 4
- 239000010410 layer Substances 0.000 description 16
- 238000012795 verification Methods 0.000 description 10
- 230000002354 daily effect Effects 0.000 description 8
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000003203 everyday effect Effects 0.000 description 2
- 239000002356 single layer Substances 0.000 description 2
- 230000036962 time dependent Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Tourism & Hospitality (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Marketing (AREA)
- Development Economics (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Quality & Reliability (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Operations Research (AREA)
- Educational Administration (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
An alarm situation prediction and early warning method comprises the following steps: predicting the number of specific alarms in a prediction time period; and secondly, early warning is carried out on the specific warning condition. The invention adopts all available data, including real name track data, one-standard three-real data, alarm condition data, forepart data, data obtained by Internet crawling, such as weather, regional building (business) information and the like, manually labeled data, such as urban village data and the like, and adopts a plurality of models for prediction and early warning, and simultaneously trains independent prediction models and early warning models aiming at different alarm conditions.
Description
Technical Field
The invention belongs to the technical field of warning situation analysis, and particularly relates to a warning situation prediction and early warning method and a warning situation prediction and early warning system.
Background
The current warning situation prediction and early warning are based on historical warning to carry out time and space dimension statistical analysis, wherein the time dimension considers the same ring ratio, the space dimension considers the branch and the district to be dispatched, the current method only considers the historical factors, the future warning situation is artificially presumed according to the historical statistical information, the method is greatly influenced by the subjectivity of the judging personnel, other existing data cannot be fully utilized, and objective future judgment is carried out by combining the space and the current time factor.
Disclosure of Invention
Based on the above, the warning situation prediction and early warning method and system are provided for the technical problems.
In order to solve the technical problems, the invention adopts the following technical scheme:
an alarm situation prediction and early warning method comprises the following steps:
predicting the number of specific alarms in a prediction time period, wherein the prediction time period comprises a preset prediction starting date and a prediction number of days:
(111) counting the specific alarm condition occurring before the prediction starting date to obtain the time sequence data of the specific alarm condition: the date on which the specific alarm condition occurs and the occurrence number of the specific alarm condition on the date;
(112) performing statistical analysis on the time sequence data by adopting a seasonal autoregressive moving average model SARIMA to obtain the trend and the distribution of the specific alarm, and determining trend parameters and seasonal parameters according to the analysis result, wherein the trend parameters comprise a trend autoregressive order P, a trend differential order D and a trend moving average order Q, and the seasonal parameters comprise a seasonal autoregressive order P, a seasonal differential order D, a seasonal moving average order Q and a seasonal trend parameter s;
(113) constructing a quantity prediction model of the specific alarm according to the trend parameters and the seasonal parameters: SARIMA (P, D, Q) x (P, D, Q, s), training the quantitative prediction model to fit the time series data of the specific alarm condition through the time series data of the specific alarm condition;
(114) inputting the prediction days into the quantity prediction model to obtain a prediction result in the prediction time period;
secondly, early warning is carried out on specific warning situations:
(121) dividing a district into grid areas of n × n meters on an electronic map, and setting three granularity areas from small to large: grid, district and sub-district, n is 450-;
(122) at the three granularities, the spatio-temporal characteristics of each region at each day of history are constructed:
122a, constructing activity characteristics in the current area according to the real-name track database and the predecessor database, wherein the activity characteristics comprise the total number of activities/the number of men/the number of predecessors in the hotel in the current area n1/n2/n3/n4 days before the current day and the total number of activities/the number of men/the number of predecessors in the internet cafe in the current area n1/n2/n3/n4 days before the current day;
122b, constructing the alarm characteristics of the specific alarm in the current area according to an alarm database, wherein the alarm characteristics comprise the alarm quantity of the specific alarm in the current area n1/n2/n3/n4 days before the current day;
122c, obtaining interest point attribute characteristics in the current area from the electronic map;
122d, constructing the regional attribute characteristics in the current region according to a standard three-entity database;
122e, acquiring weather data from the Internet to construct weather characteristics in the current area of the day;
(123) setting a label for each historical day of each area, judging whether the number of the specific warnings which contain N days after the current day is larger than the number of the specific warnings which contain N days before the current day and is not 0, and if not, setting the label of the current day to be 1, otherwise, setting the label of the current day to be 0;
(124) according to an alarm condition database, counting the historical daily average alarm condition number of the specific alarm condition in each grid to obtain an alarm condition high-rate grid with the historical daily average alarm condition number higher than 90%;
(125) on the three granularities, a first feature set and a label set are constructed through the space-time features and the corresponding labels of each region, a neural network model is trained through the first feature set and the label set, the output of the middle layer is taken as a second feature set, and the middle layer model is stored;
(126) training a lifting tree model through the second feature set and a tag set at the three granularities;
(127) constructing a spatiotemporal feature of each warning high-rate grid on the early warning current date according to the steps 122a to 122e, inputting the spatiotemporal feature into a lifting tree model, obtaining the probability of whether the number of the specific warning conditions of each warning high-rate grid on the last N days of the current date is greater than the number of the specific warning conditions on the last N days, and early warning according to the probability;
wherein the prediction days are 1-7, N1< N2< N3< N4, and N is 1-10.
The step (112) further comprises:
112a, carrying out d-time difference on the time sequence data, using a unit root detection method to detect the stationarity of the time sequence data, if the difference data is stable, determining the parameter difference order as d, otherwise, increasing the number of d by 1, and continuing the difference until the stable time sequence data is obtained;
112b, drawing a partial autocorrelation graph and an autocorrelation graph of the stationary time sequence data, wherein when the delay in the partial autocorrelation graph is i, obvious projection exists, but when the delay is larger, similar projection does not exist, and the value of p is determined to make p equal to i; when the delay in the autocorrelation graph is j, obvious protrusions exist, but similar protrusions do not exist when the delay is larger, the value of q is determined so that q is equal to j, exponential smoothing models ARMA (0, q), ARMA (p,0) and ARMA (p, q) are respectively constructed, the information amount of an erythrocyte pool of smoothing time sequence data on the three models, namely-2 log (L) +2(p + q + k +1), is calculated, the model with the minimum AIC is selected, and parameters (p, q) are determined, wherein L is a likelihood function of the smoothing time sequence data, k is equal to 1 when c is equal to 0, k is 0 when c is equal to 0, and c is an average value of continuous observed value changes;
112c, constructing an exponential smoothing model ARMA (p, q) according to the parameters (p, q), calculating residual errors of smoothing time sequence data in the model ARMA (p, q), adopting a D-W test mode to test whether the residual errors are autocorrelation, drawing a bitmap to test whether the residual errors accord with normal distribution with the average value of 0 and the variance of a constant, further confirming the selected parameters (p, q), returning to the previous step for reselecting the parameters (p, q) if the conditions are not met, and selecting a smaller value close to the previously selected parameters when the parameters are reselected;
112d, obtaining time sequence data in the step (111), taking time delays of 7 days, 1 month and 3 months, respectively setting corresponding seasonal trend parameters s as 7, 12 and 4, sequentially using the time delays to carry out seasonal differentiation, carrying out unit root inspection on differentiated data, judging data stability, selecting the seasonal trend parameter s with the best stability, and determining the value of the seasonal trend parameter s;
112e, after determining the seasonal trend s, repeating the step 112a using the seasonally differentiated data to obtain a seasonal differentiation order D, and repeating the steps 112b and 112c to obtain a seasonal autoregressive order P and a seasonal moving average order Q.
The early warning according to the probability is that the corresponding grids are marked and early warned through four different colors:
labeling the grids with the probability greater than 0.68 through a first color;
labeling the grids with the probability of being greater than 0.34 and less than or equal to 0.68 through a second color;
labeling the grids with the probability less than or equal to 0.34 through a third color;
and marking the non-warning high-rate grids through a fourth color.
The step (122) further comprises:
other time-related features were constructed: whether the day is weekend, whether the day is holiday, and whether the day is on a rest.
The invention also relates to an alarm situation prediction and early warning system, which comprises a storage module, wherein a plurality of instructions are stored in the storage module, and the instructions are loaded and executed by a processor:
predicting the number of specific alarms in a prediction time period, wherein the prediction time period comprises a preset prediction starting date and a prediction number of days:
(111) counting the specific alarm condition occurring before the prediction starting date to obtain the time sequence data of the specific alarm condition: the date on which the specific alarm condition occurs and the occurrence number of the specific alarm condition on the date;
(112) performing statistical analysis on the time sequence data by adopting a seasonal autoregressive moving average model SARIMA to obtain the trend and the distribution of the specific alarm, and determining trend parameters and seasonal parameters according to the analysis result, wherein the trend parameters comprise a trend autoregressive order P, a trend differential order D and a trend moving average order Q, and the seasonal parameters comprise a seasonal autoregressive order P, a seasonal differential order D, a seasonal moving average order Q and a seasonal trend parameter s;
(113) constructing a quantity prediction model of the specific alarm according to the trend parameters and the seasonal parameters: SARIMA (P, D, Q) x (P, D, Q, s), training the quantitative prediction model to fit the time series data of the specific alarm condition through the time series data of the specific alarm condition;
(114) inputting the prediction days into the quantity prediction model to obtain a prediction result in the prediction time period;
secondly, early warning is carried out on specific warning situations:
(121) dividing a district into grid areas of n × n meters on an electronic map, and setting three granularity areas from small to large: grid, district and sub-district, n is 450-;
(122) at the three granularities, the spatio-temporal characteristics of each region at each day of history are constructed:
122a, constructing activity characteristics in the current area according to the real-name track database and the predecessor database, wherein the activity characteristics comprise the total number of activities/the number of men/the number of predecessors in the hotel in the current area n1/n2/n3/n4 days before the current day and the total number of activities/the number of men/the number of predecessors in the internet cafe in the current area n1/n2/n3/n4 days before the current day;
122b, constructing the alarm characteristics of the specific alarm in the current area according to an alarm database, wherein the alarm characteristics comprise the alarm quantity of the specific alarm in the current area n1/n2/n3/n4 days before the current day;
122c, obtaining interest point attribute characteristics in the current area from the electronic map;
122d, constructing the regional attribute characteristics in the current region according to a standard three-entity database;
122e, acquiring weather data from the Internet to construct weather characteristics in the current area of the day;
(123) setting a label for each historical day of each area, judging whether the number of the specific warnings which contain N days after the current day is larger than the number of the specific warnings which contain N days before the current day and is not 0, and if not, setting the label of the current day to be 1, otherwise, setting the label of the current day to be 0;
(124) according to an alarm condition database, counting the historical daily average alarm condition number of the specific alarm condition in each grid to obtain an alarm condition high-rate grid with the historical daily average alarm condition number higher than 90%;
(125) on the three granularities, a first feature set and a label set are constructed through the space-time features and the corresponding labels of each region, a neural network model is trained through the first feature set and the label set, the output of the middle layer is taken as a second feature set, and the middle layer model is stored;
(126) training a lifting tree model through the second feature set and a tag set at the three granularities;
(127) constructing a spatiotemporal feature of each warning high-rate grid on the early warning current date according to the steps 122a to 122e, inputting the spatiotemporal feature into a lifting tree model, obtaining the probability of whether the number of the specific warning conditions of each warning high-rate grid on the last N days of the current date is greater than the number of the specific warning conditions on the last N days, and early warning according to the probability;
wherein the prediction days are 1-7, N1< N2< N3< N4, and N is 1-10.
The step (112) further comprises:
112a, carrying out d-time difference on the time sequence data, using a unit root detection method to detect the stationarity of the time sequence data, if the difference data is stable, determining the parameter difference order as d, otherwise, increasing the number of d by 1, and continuing the difference until the stable time sequence data is obtained;
112b, drawing a partial autocorrelation graph and an autocorrelation graph of the stationary time sequence data, wherein when the delay in the partial autocorrelation graph is i, obvious projection exists, but when the delay is larger, similar projection does not exist, and the value of p is determined to make p equal to i; when the delay in the autocorrelation graph is j, obvious protrusions exist, but similar protrusions do not exist when the delay is larger, the value of q is determined so that q is equal to j, exponential smoothing models ARMA (0, q), ARMA (p,0) and ARMA (p, q) are respectively constructed, the information amount of an erythrocyte pool of smoothing time sequence data on the three models, namely-2 log (L) +2(p + q + k +1), is calculated, the model with the minimum AIC is selected, and parameters (p, q) are determined, wherein L is a likelihood function of the smoothing time sequence data, k is equal to 1 when c is equal to 0, k is 0 when c is equal to 0, and c is an average value of continuous observed value changes;
112c, constructing an exponential smoothing model ARMA (p, q) according to the parameters (p, q), calculating residual errors of smoothing time sequence data in the model ARMA (p, q), adopting a D-W test mode to test whether the residual errors are autocorrelation, drawing a bitmap to test whether the residual errors accord with normal distribution with the average value of 0 and the variance of a constant, further confirming the selected parameters (p, q), returning to the previous step for reselecting the parameters (p, q) if the conditions are not met, and selecting a smaller value close to the previously selected parameters when the parameters are reselected;
112d, obtaining time sequence data in the step (111), taking the time delays of 7 days, 1 month and 3 months, respectively taking the corresponding seasonal trend parameters s of 7, 12 and 4, sequentially using the time delays to carry out seasonal differentiation, carrying out unit root inspection on the differentiated data, judging the data stability, selecting the seasonal trend parameter s with the best stability, and determining the value of the seasonal trend parameter s;
112e, after determining the seasonal trend s, repeating the step 112a using the seasonally differentiated data to obtain a seasonal differentiation order D, and repeating the steps 112b and 112c to obtain a seasonal autoregressive order P and a seasonal moving average order Q.
The early warning according to the probability is that the corresponding grids are marked and early warned through four different colors:
labeling the grids with the probability greater than 0.68 through a first color;
labeling the grids with the probability of being greater than 0.34 and less than or equal to 0.68 through a second color;
labeling the grids with the probability less than or equal to 0.34 through a third color;
and marking the non-warning high-rate grids through a fourth color.
The step (122) further comprises:
other time-related features were constructed: whether the day is weekend, whether the day is holiday, and whether the day is on a rest.
The invention adopts all available data, including real name track data, one-standard three-real data, alarm condition data, forepart data, data obtained by Internet crawling, such as weather, regional building (business) information and the like, manually labeled data, such as urban village data and the like, and adopts a plurality of models for prediction and early warning, and simultaneously trains independent prediction models and early warning models aiming at different alarm conditions.
Detailed Description
An alarm situation prediction and early warning method comprises the following steps:
the method comprises the steps of predicting the number of specific alarms in a prediction time period, wherein the prediction time period comprises a preset prediction starting date and prediction days, the prediction days are 1-7, and the prediction days are 7 in the embodiment.
Wherein, the specific alarm condition refers to a certain type of alarm condition.
The invention trains independent quantity prediction models aiming at different alarm situations, and can more accurately predict the quantity of corresponding alarm situations.
(111) Counting a specific alarm condition occurring before the prediction starting date to obtain time sequence data of the specific alarm condition: the date on which the particular alert occurred and the number of occurrences of the particular alert on that date.
(112) Performing statistical analysis on the time sequence data by adopting a seasonal autoregressive moving average model SARIMA to obtain the trend and the distribution of the specific warning situation, and determining a trend parameter and a seasonal parameter according to an analysis result, wherein the trend parameter comprises a trend autoregressive order P, a trend difference order D and a trend moving average order Q, and the seasonal parameter comprises a seasonal autoregressive order P, a seasonal difference order D, a seasonal moving average order Q and a seasonal trend parameter s:
112a, carrying out d-time difference (one-step difference) on the time sequence data, using a unit root detection method to detect the stationarity of the time sequence data, if the difference data is stable, determining the parameter difference order as d, otherwise, increasing the number of d by 1, and continuing the difference until the stable time sequence data is obtained. If the difference is made for 1 time to the time sequence data, if the data is stable, the order of the parameter difference is determined to be 1, otherwise, the difference is made for 2 times to the time sequence data, if the data is stable, the order of the parameter difference is determined to be 2, otherwise, the difference is made for 3 times to the time sequence data, and so on. Generally, 3 differences can be smoothed.
112b, drawing a partial autocorrelation graph and an autocorrelation graph of the stationary time sequence data, wherein when the delay in the partial autocorrelation graph is i, obvious projection exists, but when the delay is larger, similar projection does not exist, and the value of p is determined to make p equal to i; the method comprises the steps of determining the value of q to be equal to j, respectively constructing exponential smoothing models ARMA (0, q) (namely p to 0), ARMA (p,0) (namely q to 0) and ARMA (p, q), calculating an average value of an information quantity AIC of smooth time sequence data on three models to be equal to-2 log (L) and +2(p + q + k +1), selecting the model with the minimum AIC, determining parameters (p, q), wherein L is a likelihood function of the smooth time sequence, k is 1 when c is equal to 0, k is 0 when c is equal to 0, c is an average value of continuous observed value changes, the continuous observed value refers to a continuous value on the smooth time sequence after d differences, and in the continuous time sequence, one data value can be called as an observed value, for example, the observed value of 2019-07-07 is 20, 2019-07-08, the change of the continuous observed value is 5, and the average value of the differences of the observed values of all adjacent continuous dates is the value c.
112c, constructing an exponential smoothing model ARMA (p, q) according to the determined parameters (p, q), calculating the residual error of the smoothing time sequence data in the model ARMA (p, q), adopting a D-W test mode to test whether the residual error is autocorrelation, drawing a bitmap to test whether the residual error accords with normal distribution with the average value of 0 and the variance of constant, further confirming the selected parameters (p, q), returning to the previous step for reselecting the parameters (p, q) if the conditions are not met, and selecting a smaller value close to the previously selected parameters when the parameters are reselected.
112d, obtaining time sequence data in the step (111), taking the time delays m as 7 days, 1 month and 3 months, respectively taking the corresponding seasonal trend parameters s as 7, 12 and 4, sequentially using the time delays to carry out seasonal difference (difference with time delay m), carrying out unit root inspection on the data after difference, judging the data stability, selecting the seasonal trend parameter s with the best stability, and determining the value of the seasonal trend parameter s.
112e, after determining the seasonal trend s, repeating the step 112a using the seasonally differentiated data to obtain a seasonal differentiation order D, and repeating the steps 112b and 112c to obtain a seasonal autoregressive order P and a seasonal moving average order Q.
(113) Constructing a quantity prediction model of the specific alarm situations according to the trend parameters and the seasonal parameters: SARIMA (P, D, Q) x (P, D, Q, s), training the quantitative prediction model to fit the time series data by the time series data of the specific alarm condition.
(114) And inputting the predicted days into the quantity prediction model to obtain a prediction result in the prediction time period.
Considering that the prediction is more inaccurate as the prediction days are longer, the prediction days of the embodiment are not suitable to be too large (<7), the model is retrained every day (the parameters are not required to be determined again, the parameters are fixed once, and only one day of data is added to the training data for retraining), and the model is retrained and the prediction result covers yesterday.
Taking alarm a as an example, assuming that the predicted start date is 2019-05-05, we need to predict the number of alarms a according to the time series data of alarm a before 2019-05-05, if the time series data of 2019-05-05 is:
...
2019-05-01 26
2019-05-02 31
2019-05-03 28
2019-05-04 24
we determine the model SARIMA (P, D, Q) x (P, D, Q, s) from the above data by the above steps, and train the model with the time series data to fit the data, and then predict using the trained model. Because the model is a time sequence model, the training of time sequence data before a certain time period can directly predict data of specified days after the time period, and if the training data is 2019-03-01-2019-05-04, the trained model can directly predict future data of the specified prediction days. And if the specified prediction days are 4, predicting data of 2019-05-2019-05-08 for four days.
When the current time is 2019-05-05, parameters of a model SARIMA (P, D, Q) x (P, D, Q, s) are unchanged, 2019-05-05 data are added into training data, namely the training data are 2019-03-01-2019-05-05, and the model is trained; the prediction days are still 4, namely data of 2019-05-06-2019-05-09 four days are predicted, the prediction result of yesterday is covered (2019-05-06-2019-05-08 is covered), and prediction data of 2019-05-09 are added.
Secondly, early warning is carried out on specific warning situations:
(121) dividing a district into grid areas of n × n meters on an electronic map, and setting three granularity areas from small to large: grid, district to be distinguished and district to be distinguished, n is 450-.
(122) At three granularities, spatio-temporal characteristics of each region at each day of history are constructed.
The history is generally 2-3 years before the current date, namely, data 2-3 years before the current date is acquired, the history length is determined according to database data, and if the number of data books in the database is not large (<2 years), the number of data books is acquired. If the current date is 9, 24 and 2019, and the warning needs to be carried out on a certain specific warning condition, the warning current date is 9, 24 and 2019.
122a, constructing activity characteristics in the current area according to the real-name track database and the predecessor database, wherein the activity characteristics comprise the total number of people who have activities/men/predecessors in the hotel in the current area for the previous n1/n2/n3/n4 days and the total number of people who have activities/men/predecessors in the internet cafe in the current area for the previous n1/n2/n3/n4 days, wherein n1< n2< n3< n 4.
The "predecessor" herein refers to a predecessor of a particular alert.
In the present embodiment, n1, n2, n3 and n4 are 1, 2, 5 and 7, respectively, the same applies below.
For example, in each historical day, the current day is 2018-08-08, and the activity characteristics of the current day in the area A comprise the total activity number of all the hotels in 1 day (2018-08-07), the total activity number of all the hotels in the area A in 2 days (2018-08-06-2018-08-07), the total activity number of all the hotels in the area A in 5 days and the total activity number of all the hotels in the area A in 7 days.
For a hotel, the number of people is the number of people registered to check in; for the internet bar, the number of people is the number of people who register to surf the internet.
122b, constructing the alarm characteristics of the specific alarms in the current area according to the alarm database, wherein the alarm characteristics comprise the alarm quantity of the specific alarms in the current area at the previous n1/n2/n3/n4 days. For example, the current day is 2018-08-08, the area A, the alarm quantity of the alarm A1 day (2018-08-07) before the area A, and the alarm quantity of the alarm A2 days (2018-08-06-2018-08-07) before the area A.
122c, obtaining the interest point attribute characteristics in the current area from the electronic map: the quantity of hotel, internet bar, building site, traffic hub, camera, district, market, bar, KTV and bank, nearest party, station distance, whether rural area etc. wherein, nearest party, station distance use regional center as the starting point to calculate.
122d, constructing the region attribute characteristics in the current region according to a standard three-entity database: regional population, number and distribution, regional building, density and distribution, regional business, distribution, etc.
122e, acquiring weather data from the Internet to construct weather characteristics in the current area of the day: such as weather, temperature, wind, air quality, statistics thereof, etc.
Other time-dependent features can also be constructed: whether the day is weekend, whether the day is holiday, and whether the day is on a rest.
(123) Setting a label for each historical day of each area, judging whether the number of the specific warnings including N days after the current day is larger than the number of the specific warnings including N days before the current day and is not 0, wherein the label of the current day is 1, otherwise, the label of the current day is 0, N is 1-10, in the embodiment, N is 7, and the same is applied below. If the number of alarms in the last 7 days (2018-08-14) of the current day is greater than that in the first 7 days (2018-08-01-2018-08-07) of the current day and is not 0, the label is equal to 1, and otherwise, the label is equal to 0.
(124) And according to the warning condition database, counting the historical daily average warning condition number of the specific warning condition in each grid to obtain the warning condition high-rate grids with the historical daily average warning condition number higher than 90%.
(125) On three granularities, a first feature set and a label set are constructed through the space-time features of each region and corresponding labels, a neural network model is trained through the first feature set and the label set, the output of the middle layer is taken as a second feature set, and the middle layer model is stored;
if on the grid granularity, aiming at the alarm A, a single-layer neural network model (NN) is constructed, the spatio-temporal characteristics of the alarm A and the corresponding labels are fed to the model for training, the output of the middle layer is taken as a new characteristic set, and the middle layer model is stored.
The space-time characteristic is as follows:
time 1 region 1 feature 1.1 feature 2.1 feature 3.1 feature 4.1 … … tag 1
Time 2 region 1 feature 1.2 feature 2.2 feature 3.2 feature 4.2 … … tag 2
Time 1 region 2 feature 1.3 feature 2.3 feature 3.3 feature 4.3 … … tag 3
Time 2, region 2, feature 1.4, feature 2.4, feature 3.4, feature 4.4 … …, tag 4.
(126) And training a lifting tree model through a second characteristic set and a label set on three granularities:
and respectively dividing the second feature set and the label set into a training set, a verification set and a test set on three granularities, wherein the training set is used for training the model, the verification set is used for checking the training degree of the model and stopping training in time, and the test set is used for checking the generalization ability (the ability of predicting unknown data) of the model. Setting hyper-parameters of the lifting tree model, training the model, stopping training when the accuracy of the model on the verification set is not lifted any more, and verifying the generalization capability of the model by using the test set; and adjusting the hyper-parameters, and repeating the training step and the test set verification step. And finally, selecting a group of hyper-parameters with the best effect on the test set, retraining the lifting tree model and storing the trained lifting tree model.
(127) And (3) constructing a space-time characteristic of each warning high-rate grid on the early warning current date according to the steps 122a to 122e, inputting the space-time characteristic into the lifting tree model, obtaining the probability of whether the number of specific warnings of each warning high-rate grid on the last N days of the current date is larger than that on the last N days, and early warning according to the probability.
In this embodiment, the corresponding grid is labeled and warned by four different colors:
labeling the grids with probability greater than 0.68 by a first color (red);
labeling the grids with the probability of being greater than 0.34 and less than or equal to 0.68 through a second color (orange);
labeling the grids with the probability less than or equal to 0.34 through a third color (yellow);
and marking the non-alert high-rate grids through a fourth color (green).
Of course, the warning can be performed in other ways according to the probability.
For example, for alert a, the following spatiotemporal features exist:
time 1 region 1 feature 1.1 feature 2.1 feature 3.1 feature 4.1 … … tag 1
Time 2 region 1 feature 1.2 feature 2.2 feature 3.2 feature 4.2 … … tag 2
Time 1 region 2 feature 1.3 feature 2.3 feature 3.3 feature 4.3 … … tag 3
Time 2, region 2, feature 1.4, feature 2.4, feature 3.4, feature 4.4 … …, tag 4.
To obtain a first feature set X, i.e.
X ═ feature 1.1 feature 2.1 feature 3.1 feature 4.1 … …
Feature 1.2 feature 2.2 feature 3.2 feature 4.2 … …
Feature 1.3 feature 2.3 feature 3.3 feature 4.3 … …
Feature 1.4 feature 2.4 feature 3.4 feature 4.4 … …
......],
A tag set y is obtained, i.e. y ═ tag 1, tag 2, tag 3, tag 4, ·. ],
inputting X and y into a neural network model for training, storing an intermediate layer model, and obtaining a second feature set, namely:
[ new feature 1.1 new feature 2.1 new feature 3.1 new feature 4.1 … …
New features 1.2 new features 2.2 new features 3.2 new features 4.2 … …
New features 1.3 new features 2.3 new features 3.3 new features 4.3 … …
New features 1.4 new features 2.4 new features 3.4 new features 4.4 … …
......],
And taking the second feature set and the second feature set y as input, dividing a training set, a verification set and a test set, and training a lifting tree model.
Supposing that the early warning current date is 2019-05-05 and N is 7, the data used for training is data before 2019-05-05 ([ 2019-03-01-2019-05-04 ]), space-time characteristics and labels of all the areas [ 2019-03-01-2019-05-04 ] are constructed, then the characteristic sets and the label sets are input into the trained lifting tree model through the neural network middle layer model to obtain new characteristic sets, a prediction value can be output by the model, and the prediction value is the probability whether the number of the current areas in the future 7 days of the early warning current date is larger than the number of the current areas in the past 7 days.
The invention adopts all available data, including real name track data, one-standard three-real data, alarm condition data, forepart data, data obtained by Internet crawling, such as weather, regional building (business) information and the like, manually labeled data, such as urban village data and the like, and adopts a plurality of models to predict and pre-warn, and simultaneously trains independent prediction models and pre-warn models (lifting tree models) aiming at different alarm conditions.
The scheme also relates to an alarm situation prediction and early warning system which comprises a storage module, wherein a plurality of instructions are stored in the storage module, and the instructions are loaded and executed by a processor:
the method comprises the steps of predicting the number of specific alarms in a prediction time period, wherein the prediction time period comprises a preset prediction starting date and prediction days, the prediction days are 1-7, and the prediction days are 7 in the embodiment.
Wherein, the specific alarm condition refers to a certain type of alarm condition.
The invention trains independent quantity prediction models aiming at different alarm situations, and can more accurately predict the quantity of corresponding alarm situations.
(111) Counting a specific alarm condition occurring before the prediction starting date to obtain time sequence data of the specific alarm condition: the date on which the particular alert occurred and the number of occurrences of the particular alert on that date.
(112) Performing statistical analysis on the time sequence data by adopting a seasonal autoregressive moving average model SARIMA to obtain the trend and the distribution of the specific warning situation, and determining a trend parameter and a seasonal parameter according to an analysis result, wherein the trend parameter comprises a trend autoregressive order P, a trend difference order D and a trend moving average order Q, and the seasonal parameter comprises a seasonal autoregressive order P, a seasonal difference order D, a seasonal moving average order Q and a seasonal trend parameter s:
112a, carrying out d-time difference (one-step difference) on the time sequence data, using a unit root detection method to detect the stationarity of the time sequence data, if the difference data is stable, determining the parameter difference order as d, otherwise, increasing the number of d by 1, and continuing the difference until the stable time sequence data is obtained. If the difference is made for 1 time to the time sequence data, if the data is stable, the order of the parameter difference is determined to be 1, otherwise, the difference is made for 2 times to the time sequence data, if the data is stable, the order of the parameter difference is determined to be 2, otherwise, the difference is made for 3 times to the time sequence data, and so on. Generally, 3 differences can be smoothed.
112b, drawing a partial autocorrelation graph and an autocorrelation graph of the stationary time sequence data, wherein when the delay in the partial autocorrelation graph is i, obvious projection exists, but when the delay is larger, similar projection does not exist, and the value of p is determined to make p equal to i; the method comprises the steps of determining the value of q to be equal to j, respectively constructing exponential smoothing models ARMA (0, q) (namely p to 0), ARMA (p,0) (namely q to 0) and ARMA (p, q), calculating an average value of an information quantity AIC of smooth time sequence data on three models to be equal to-2 log (L) and +2(p + q + k +1), selecting the model with the minimum AIC, determining parameters (p, q), wherein L is a likelihood function of the smooth time sequence, k is 1 when c is equal to 0, k is 0 when c is equal to 0, c is an average value of continuous observed value changes, the continuous observed value refers to a continuous value on the smooth time sequence after d differences, and in the continuous time sequence, one data value can be called as an observed value, for example, the observed value of 2019-07-07 is 20, 2019-07-08, the change of the continuous observed value is 5, and the average value of the differences of the observed values of all adjacent continuous dates is the value c.
112c, constructing an exponential smoothing model ARMA (p, q) according to the determined parameters (p, q), calculating the residual error of the smoothing time sequence data in the model ARMA (p, q), adopting a D-W test mode to test whether the residual error is autocorrelation, drawing a bitmap to test whether the residual error accords with normal distribution with the average value of 0 and the variance of constant, further confirming the selected parameters (p, q), returning to the previous step for reselecting the parameters (p, q) if the conditions are not met, and selecting a smaller value close to the previously selected parameters when the parameters are reselected.
112d, obtaining time sequence data in the step (111), taking the time delays m as 7 days, 1 month and 3 months, respectively taking the corresponding seasonal trend parameters s as 7, 12 and 4, sequentially using the time delays to carry out seasonal difference (difference with time delay m), carrying out unit root inspection on the data after difference, judging the data stability, selecting the seasonal trend parameter s with the best stability, and determining the value of the seasonal trend parameter s.
112e, after determining the seasonal trend s, repeating the step 112a using the seasonally differentiated data to obtain a seasonal differentiation order D, and repeating the steps 112b and 112c to obtain a seasonal autoregressive order P and a seasonal moving average order Q.
(113) Constructing a quantity prediction model of the specific alarm situations according to the trend parameters and the seasonal parameters: SARIMA (P, D, Q) x (P, D, Q, s), training the quantitative prediction model to fit the time series data by the time series data of the specific alarm condition.
(114) And inputting the predicted days into the quantity prediction model to obtain a prediction result in the prediction time period.
Considering that the prediction is more inaccurate as the prediction days are longer, the prediction days of the embodiment are not suitable to be too large (<7), the model is retrained every day (the parameters are not required to be determined again, the parameters are fixed once, and only one day of data is added to the training data for retraining), and the model is retrained and the prediction result covers yesterday.
Taking alarm a as an example, assuming that the predicted start date is 2019-05-05, we need to predict the number of alarms a according to the time series data of alarm a before 2019-05-05, if the time series data of 2019-05-05 is:
...
2019-05-01 26
2019-05-02 31
2019-05-03 28
2019-05-04 24
we determine the model SARIMA (P, D, Q) x (P, D, Q, s) from the above data by the above steps, and train the model with the time series data to fit the data, and then predict using the trained model. Because the model is a time sequence model, the training of time sequence data before a certain time period can directly predict data of specified days after the time period, and if the training data is 2019-03-01-2019-05-04, the trained model can directly predict future data of the specified prediction days. And if the specified prediction days are 4, predicting data of 2019-05-2019-05-08 for four days.
When the current time is 2019-05-05, parameters of a model SARIMA (P, D, Q) x (P, D, Q, s) are unchanged, 2019-05-05 data are added into training data, namely the training data are 2019-03-01-2019-05-05, and the model is trained; the prediction days are still 4, namely data of 2019-05-06-2019-05-09 four days are predicted, the prediction result of yesterday is covered (2019-05-06-2019-05-08 is covered), and prediction data of 2019-05-09 are added.
Secondly, early warning is carried out on specific warning situations:
(121) dividing a district into grid areas of n × n meters on an electronic map, and setting three granularity areas from small to large: grid, district to be distinguished and district to be distinguished, n is 450-.
(122) At three granularities, spatio-temporal characteristics of each region at each day of history are constructed.
The history is generally 2-3 years before the current date, namely, data 2-3 years before the current date is acquired, the history length is determined according to database data, and if the number of data books in the database is not large (<2 years), the number of data books is acquired. If the current date is 9, 24 and 2019, and the warning needs to be carried out on a certain specific warning condition, the warning current date is 9, 24 and 2019.
122a, constructing activity characteristics in the current area according to the real-name track database and the predecessor database, wherein the activity characteristics comprise the total number of people who have activities/men/predecessors in the hotel in the current area for the previous n1/n2/n3/n4 days and the total number of people who have activities/men/predecessors in the internet cafe in the current area for the previous n1/n2/n3/n4 days, wherein n1< n2< n3< n 4.
The "predecessor" herein refers to a predecessor of a particular alert.
In the present embodiment, n1, n2, n3 and n4 are 1, 2, 5 and 7, respectively, the same applies below.
For example, in each historical day, the current day is 2018-08-08, and the activity characteristics of the current day in the area A comprise the total activity number of all the hotels in 1 day (2018-08-07), the total activity number of all the hotels in the area A in 2 days (2018-08-06-2018-08-07), the total activity number of all the hotels in the area A in 5 days and the total activity number of all the hotels in the area A in 7 days.
For a hotel, the number of people is the number of people registered to check in; for the internet bar, the number of people is the number of people who register to surf the internet.
122b, constructing the alarm characteristics of the specific alarms in the current area according to the alarm database, wherein the alarm characteristics comprise the alarm quantity of the specific alarms in the current area at the previous n1/n2/n3/n4 days. For example, the current day is 2018-08-08, the area A, the alarm quantity of the alarm A1 day (2018-08-07) before the area A, and the alarm quantity of the alarm A2 days (2018-08-06-2018-08-07) before the area A.
122c, obtaining the interest point attribute characteristics in the current area from the electronic map: the quantity of hotel, internet bar, building site, traffic hub, camera, district, market, bar, KTV and bank, nearest party, station distance, whether rural area etc. wherein, nearest party, station distance use regional center as the starting point to calculate.
122d, constructing the region attribute characteristics in the current region according to a standard three-entity database: regional population, number and distribution, regional building, density and distribution, regional business, distribution, etc.
122e, acquiring weather data from the Internet to construct weather characteristics in the current area of the day: such as weather, temperature, wind, air quality, statistics thereof, etc.
Other time-dependent features can also be constructed: whether the day is weekend, whether the day is holiday, and whether the day is on a rest.
(123) Setting a label for each historical day of each area, judging whether the number of the specific warnings including N days after the current day is larger than the number of the specific warnings including N days before the current day and is not 0, wherein the label of the current day is 1, otherwise, the label of the current day is 0, N is 1-10, in the embodiment, N is 7, and the same is applied below. If the number of alarms in the last 7 days (2018-08-14) of the current day is greater than that in the first 7 days (2018-08-01-2018-08-07) of the current day and is not 0, the label is equal to 1, and otherwise, the label is equal to 0.
(124) And according to the warning condition database, counting the historical daily average warning condition number of the specific warning condition in each grid to obtain the warning condition high-rate grids with the historical daily average warning condition number higher than 90%.
(125) On three granularities, a first feature set and a label set are constructed through the space-time features of each region and corresponding labels, a neural network model is trained through the first feature set and the label set, the output of the middle layer is taken as a second feature set, and the middle layer model is stored;
if on the grid granularity, aiming at the alarm A, a single-layer neural network model (NN) is constructed, the spatio-temporal characteristics of the alarm A and the corresponding labels are fed to the model for training, the output of the middle layer is taken as a new characteristic set, and the middle layer model is stored.
The space-time characteristic is as follows:
time 1 region 1 feature 1.1 feature 2.1 feature 3.1 feature 4.1 … … tag 1
Time 2 region 1 feature 1.2 feature 2.2 feature 3.2 feature 4.2 … … tag 2
Time 1 region 2 feature 1.3 feature 2.3 feature 3.3 feature 4.3 … … tag 3
Time 2, region 2, feature 1.4, feature 2.4, feature 3.4, feature 4.4 … …, tag 4.
(126) And training a lifting tree model through a second characteristic set and a label set on three granularities:
and respectively dividing the second feature set and the label set into a training set, a verification set and a test set on three granularities, wherein the training set is used for training the model, the verification set is used for checking the training degree of the model and stopping training in time, and the test set is used for checking the generalization ability (the ability of predicting unknown data) of the model. Setting hyper-parameters of the lifting tree model, training the model, stopping training when the accuracy of the model on the verification set is not lifted any more, and verifying the generalization capability of the model by using the test set; and adjusting the hyper-parameters, and repeating the training step and the test set verification step. And finally, selecting a group of hyper-parameters with the best effect on the test set, retraining the lifting tree model and storing the trained lifting tree model.
(127) And (3) constructing a space-time characteristic of each warning high-rate grid on the early warning current date according to the steps 122a to 122e, inputting the space-time characteristic into the lifting tree model, obtaining the probability of whether the number of specific warnings of each warning high-rate grid on the last N days of the current date is larger than that on the last N days, and early warning according to the probability.
In this embodiment, the corresponding grid is labeled and warned by four different colors:
labeling the grids with probability greater than 0.68 by a first color (red);
labeling the grids with the probability of being greater than 0.34 and less than or equal to 0.68 through a second color (orange);
labeling the grids with the probability less than or equal to 0.34 through a third color (yellow);
and marking the non-alert high-rate grids through a fourth color (green).
Of course, the warning can be performed in other ways according to the probability.
For example, for alert a, the following spatiotemporal features exist:
time 1 region 1 feature 1.1 feature 2.1 feature 3.1 feature 4.1 … … tag 1
Time 2 region 1 feature 1.2 feature 2.2 feature 3.2 feature 4.2 … … tag 2
Time 1 region 2 feature 1.3 feature 2.3 feature 3.3 feature 4.3 … … tag 3
Time 2, region 2, feature 1.4, feature 2.4, feature 3.4, feature 4.4 … …, tag 4.
To obtain a first feature set X, i.e.
X ═ feature 1.1 feature 2.1 feature 3.1 feature 4.1 … …
Feature 1.2 feature 2.2 feature 3.2 feature 4.2 … …
Feature 1.3 feature 2.3 feature 3.3 feature 4.3 … …
Feature 1.4 feature 2.4 feature 3.4 feature 4.4 … …
......],
A tag set y is obtained, i.e. y ═ tag 1, tag 2, tag 3, tag 4, ·. ],
inputting X and y into a neural network model for training, storing an intermediate layer model, and obtaining a second feature set, namely:
[ new feature 1.1 new feature 2.1 new feature 3.1 new feature 4.1 … …
New features 1.2 new features 2.2 new features 3.2 new features 4.2 … …
New features 1.3 new features 2.3 new features 3.3 new features 4.3 … …
New features 1.4 new features 2.4 new features 3.4 new features 4.4 … …
......],
And taking the second feature set and the second feature set y as input, dividing a training set, a verification set and a test set, and training a lifting tree model.
Supposing that the early warning current date is 2019-05-05 and N is 7, the data used for training is data before 2019-05-05 ([ 2019-03-01-2019-05-04 ]), space-time characteristics and labels of all the areas [ 2019-03-01-2019-05-04 ] are constructed, then the characteristic sets and the label sets are input into the trained lifting tree model through the neural network middle layer model to obtain new characteristic sets, a prediction value can be output by the model, and the prediction value is the probability whether the number of the current areas in the future 7 days of the early warning current date is larger than the number of the current areas in the past 7 days.
The invention adopts all available data, including real name track data, one-standard three-real data, alarm condition data, forepart data, data obtained by Internet crawling, such as weather, regional building (business) information and the like, manually labeled data, such as urban village data and the like, and adopts a plurality of models to predict and pre-warn, and simultaneously trains independent prediction models and pre-warn models (lifting tree models) aiming at different alarm conditions.
However, those skilled in the art should realize that the above embodiments are illustrative only and not limiting to the present invention, and that changes and modifications to the above described embodiments are intended to fall within the scope of the appended claims, provided they fall within the true spirit of the present invention.
Claims (8)
1. An alarm situation prediction and early warning method is characterized by comprising the following steps:
predicting the number of specific alarms in a prediction time period, wherein the prediction time period comprises a preset prediction starting date and a prediction number of days:
(111) counting the specific alarm condition occurring before the prediction starting date to obtain the time sequence data of the specific alarm condition: the date on which the specific alarm condition occurs and the occurrence number of the specific alarm condition on the date;
(112) performing statistical analysis on the time sequence data by adopting a seasonal autoregressive moving average model SARIMA to obtain the trend and the distribution of the specific alarm, and determining trend parameters and seasonal parameters according to the analysis result, wherein the trend parameters comprise a trend autoregressive order P, a trend differential order D and a trend moving average order Q, and the seasonal parameters comprise a seasonal autoregressive order P, a seasonal differential order D, a seasonal moving average order Q and a seasonal trend parameter s;
(113) constructing a quantity prediction model of the specific alarm according to the trend parameters and the seasonal parameters: SARIMA (P, D, Q) x (P, D, Q, s), training the quantitative prediction model to fit the time series data of the specific alarm condition through the time series data of the specific alarm condition;
(114) inputting the prediction days into the quantity prediction model to obtain a prediction result in the prediction time period;
secondly, early warning is carried out on specific warning situations:
(121) dividing a district into grid areas of n × n meters on an electronic map, and setting three granularity areas from small to large: grid, district and sub-district, n is 450-;
(122) at the three granularities, the spatio-temporal characteristics of each region at each day of history are constructed:
122a, constructing activity characteristics in the current area according to the real-name track database and the predecessor database, wherein the activity characteristics comprise the total number of activities/the number of men/the number of predecessors in the hotel in the current area n1/n2/n3/n4 days before the current day and the total number of activities/the number of men/the number of predecessors in the internet cafe in the current area n1/n2/n3/n4 days before the current day;
122b, constructing the alarm characteristics of the specific alarm in the current area according to an alarm database, wherein the alarm characteristics comprise the alarm quantity of the specific alarm in the current area n1/n2/n3/n4 days before the current day;
122c, obtaining interest point attribute characteristics in the current area from the electronic map;
122d, constructing the regional attribute characteristics in the current region according to a standard three-entity database;
122e, acquiring weather data from the Internet to construct weather characteristics in the current area of the day;
(123) setting a label for each historical day of each area, judging whether the number of the specific warnings which contain N days after the current day is larger than the number of the specific warnings which contain N days before the current day and is not 0, and if not, setting the label of the current day to be 1, otherwise, setting the label of the current day to be 0;
(124) according to an alarm condition database, counting the historical daily average alarm condition number of the specific alarm condition in each grid to obtain an alarm condition high-rate grid with the historical daily average alarm condition number higher than 90%;
(125) on the three granularities, a first feature set and a label set are constructed through the space-time features and the corresponding labels of each region, a neural network model is trained through the first feature set and the label set, the output of the middle layer is taken as a second feature set, and the middle layer model is stored;
(126) training a lifting tree model through the second feature set and a tag set at the three granularities;
(127) constructing a spatiotemporal feature of each warning high-rate grid on the early warning current date according to the steps 122a to 122e, inputting the spatiotemporal feature into a lifting tree model, obtaining the probability of whether the number of the specific warning conditions of each warning high-rate grid on the last N days of the current date is greater than the number of the specific warning conditions on the last N days, and early warning according to the probability;
wherein the prediction days are 1-7, N1< N2< N3< N4, and N is 1-10.
2. An alert situation prediction and warning method according to claim 1, wherein the step (112) further comprises:
112a, carrying out d-time difference on the time sequence data, using a unit root detection method to detect the stationarity of the time sequence data, if the difference data is stable, determining the parameter difference order as d, otherwise, increasing the number of d by 1, and continuing the difference until the stable time sequence data is obtained;
112b, drawing a partial autocorrelation graph and an autocorrelation graph of the stationary time sequence data, wherein when the delay in the partial autocorrelation graph is i, obvious projection exists, but when the delay is larger, similar projection does not exist, and the value of p is determined to make p equal to i; when the delay in the autocorrelation graph is j, obvious protrusions exist, but similar protrusions do not exist when the delay is larger, the value of q is determined so that q is equal to j, exponential smoothing models ARMA (0, q), ARMA (p,0) and ARMA (p, q) are respectively constructed, the information amount of an erythrocyte pool of smoothing time sequence data on the three models, namely-2 log (L) +2(p + q + k +1), is calculated, the model with the minimum AIC is selected, and parameters (p, q) are determined, wherein L is a likelihood function of the smoothing time sequence data, k is equal to 1 when c is equal to 0, k is 0 when c is equal to 0, and c is an average value of continuous observed value changes;
112c, constructing an exponential smoothing model ARMA (p, q) according to the parameters (p, q), calculating residual errors of smoothing time sequence data in the model ARMA (p, q), adopting a D-W test mode to test whether the residual errors are autocorrelation, drawing a bitmap to test whether the residual errors accord with normal distribution with the average value of 0 and the variance of a constant, further confirming the selected parameters (p, q), returning to the previous step for reselecting the parameters (p, q) if the conditions are not met, and selecting a smaller value close to the previously selected parameters when the parameters are reselected;
112d, obtaining time sequence data in the step (111), taking time delays of 7 days, 1 month and 3 months, respectively setting corresponding seasonal trend parameters s as 7, 12 and 4, sequentially using the time delays to carry out seasonal differentiation, carrying out unit root inspection on differentiated data, judging data stability, selecting the seasonal trend parameter s with the best stability, and determining the value of the seasonal trend parameter s;
112e, after determining the seasonal trend s, repeating the step 112a using the seasonally differentiated data to obtain a seasonal differentiation order D, and repeating the steps 112b and 112c to obtain a seasonal autoregressive order P and a seasonal moving average order Q.
3. An alarm situation prediction and early warning method according to claim 1 or 2, wherein the early warning according to the probability is that the corresponding grid is labeled and early warned by four different colors:
labeling the grids with the probability greater than 0.68 through a first color;
labeling the grids with the probability of being greater than 0.34 and less than or equal to 0.68 through a second color;
labeling the grids with the probability less than or equal to 0.34 through a third color;
and marking the non-warning high-rate grids through a fourth color.
4. A method for alarm situation prediction and forewarning according to claim 3, characterized in that said step (122) further comprises:
other time-related features were constructed: whether the day is weekend, whether the day is holiday, and whether the day is on a rest.
5. An alarm situation prediction and early warning system is characterized by comprising a storage module, wherein a plurality of instructions are stored in the storage module, and the instructions are loaded and executed by a processor:
predicting the number of specific alarms in a prediction time period, wherein the prediction time period comprises a preset prediction starting date and a prediction number of days:
(111) counting the specific alarm condition occurring before the prediction starting date to obtain the time sequence data of the specific alarm condition: the date on which the specific alarm condition occurs and the occurrence number of the specific alarm condition on the date;
(112) performing statistical analysis on the time sequence data by adopting a seasonal autoregressive moving average model SARIMA to obtain the trend and the distribution of the specific alarm, and determining trend parameters and seasonal parameters according to the analysis result, wherein the trend parameters comprise a trend autoregressive order P, a trend differential order D and a trend moving average order Q, and the seasonal parameters comprise a seasonal autoregressive order P, a seasonal differential order D, a seasonal moving average order Q and a seasonal trend parameter s;
(113) constructing a quantity prediction model of the specific alarm according to the trend parameters and the seasonal parameters: SARIMA (P, D, Q) x (P, D, Q, s), training the quantitative prediction model to fit the time series data of the specific alarm condition through the time series data of the specific alarm condition;
(114) inputting the prediction days into the quantity prediction model to obtain a prediction result in the prediction time period;
secondly, early warning is carried out on specific warning situations:
(121) dividing a district into grid areas of n × n meters on an electronic map, and setting three granularity areas from small to large: grid, district and sub-district, n is 450-;
(122) at the three granularities, the spatio-temporal characteristics of each region at each day of history are constructed:
122a, constructing activity characteristics in the current area according to the real-name track database and the predecessor database, wherein the activity characteristics comprise the total number of activities/the number of men/the number of predecessors in the hotel in the current area n1/n2/n3/n4 days before the current day and the total number of activities/the number of men/the number of predecessors in the internet cafe in the current area n1/n2/n3/n4 days before the current day;
122b, constructing the alarm characteristics of the specific alarm in the current area according to an alarm database, wherein the alarm characteristics comprise the alarm quantity of the specific alarm in the current area n1/n2/n3/n4 days before the current day;
122c, obtaining interest point attribute characteristics in the current area from the electronic map;
122d, constructing the regional attribute characteristics in the current region according to a standard three-entity database;
122e, acquiring weather data from the Internet to construct weather characteristics in the current area of the day;
(123) setting a label for each historical day of each area, judging whether the number of the specific warnings which contain N days after the current day is larger than the number of the specific warnings which contain N days before the current day and is not 0, and if not, setting the label of the current day to be 1, otherwise, setting the label of the current day to be 0;
(124) according to an alarm condition database, counting the historical daily average alarm condition number of the specific alarm condition in each grid to obtain an alarm condition high-rate grid with the historical daily average alarm condition number higher than 90%;
(125) on the three granularities, a first feature set and a label set are constructed through the space-time features and the corresponding labels of each region, a neural network model is trained through the first feature set and the label set, the output of the middle layer is taken as a second feature set, and the middle layer model is stored;
(126) training a lifting tree model through the second feature set and a tag set at the three granularities;
(127) constructing a spatiotemporal feature of each warning high-rate grid on the early warning current date according to the steps 122a to 122e, inputting the spatiotemporal feature into a lifting tree model, obtaining the probability of whether the number of the specific warning conditions of each warning high-rate grid on the last N days of the current date is greater than the number of the specific warning conditions on the last N days, and early warning according to the probability;
wherein the prediction days are 1-7, N1< N2< N3< N4, and N is 1-10.
6. An alarm situation prediction and warning system according to claim 5, wherein the step (112) further comprises:
112a, carrying out d-time difference on the time sequence data, using a unit root detection method to detect the stationarity of the time sequence data, if the difference data is stable, determining the parameter difference order as d, otherwise, increasing the number of d by 1, and continuing the difference until the stable time sequence data is obtained;
112b, drawing a partial autocorrelation graph and an autocorrelation graph of the stationary time sequence data, wherein when the delay in the partial autocorrelation graph is i, obvious projection exists, but when the delay is larger, similar projection does not exist, and the value of p is determined to make p equal to i; when the delay in the autocorrelation graph is j, obvious protrusions exist, but similar protrusions do not exist when the delay is larger, the value of q is determined so that q is equal to j, exponential smoothing models ARMA (0, q), ARMA (p,0) and ARMA (p, q) are respectively constructed, the information amount of an erythrocyte pool of smoothing time sequence data on the three models, namely-2 log (L) +2(p + q + k +1), is calculated, the model with the minimum AIC is selected, and parameters (p, q) are determined, wherein L is a likelihood function of the smoothing time sequence data, k is equal to 1 when c is equal to 0, k is 0 when c is equal to 0, and c is an average value of continuous observed value changes;
112c, constructing an exponential smoothing model ARMA (p, q) according to the parameters (p, q), calculating residual errors of smoothing time sequence data in the model ARMA (p, q), adopting a D-W test mode to test whether the residual errors are autocorrelation, drawing a bitmap to test whether the residual errors accord with normal distribution with the average value of 0 and the variance of a constant, further confirming the selected parameters (p, q), returning to the previous step for reselecting the parameters (p, q) if the conditions are not met, and selecting a smaller value close to the previously selected parameters when the parameters are reselected;
112d, obtaining time sequence data in the step (111), taking the time delays of 7 days, 1 month and 3 months, respectively taking the corresponding seasonal trend parameters s of 7, 12 and 4, sequentially using the time delays to carry out seasonal differentiation, carrying out unit root inspection on the differentiated data, judging the data stability, selecting the seasonal trend parameter s with the best stability, and determining the value of the seasonal trend parameter s;
112e, after determining the seasonal trend s, repeating the step 112a using the seasonally differentiated data to obtain a seasonal differentiation order D, and repeating the steps 112b and 112c to obtain a seasonal autoregressive order P and a seasonal moving average order Q.
7. An alarm situation prediction and early warning system according to claim 5 or 6, wherein the early warning according to the probability is a labeling early warning of a corresponding grid by four different colors:
labeling the grids with the probability greater than 0.68 through a first color;
labeling the grids with the probability of being greater than 0.34 and less than or equal to 0.68 through a second color;
labeling the grids with the probability less than or equal to 0.34 through a third color;
and marking the non-warning high-rate grids through a fourth color.
8. An alert situation prediction and warning system according to claim 7, wherein the step (122) further comprises:
other time-related features were constructed: whether the day is weekend, whether the day is holiday, and whether the day is on a rest.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911035111.5A CN110889536A (en) | 2019-10-29 | 2019-10-29 | Method and system for predicting and early warning situation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911035111.5A CN110889536A (en) | 2019-10-29 | 2019-10-29 | Method and system for predicting and early warning situation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110889536A true CN110889536A (en) | 2020-03-17 |
Family
ID=69746569
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911035111.5A Pending CN110889536A (en) | 2019-10-29 | 2019-10-29 | Method and system for predicting and early warning situation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110889536A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111353828A (en) * | 2020-03-30 | 2020-06-30 | 中国工商银行股份有限公司 | Method and device for predicting number of people arriving at store from network |
CN113570846A (en) * | 2021-06-08 | 2021-10-29 | 北京交通大学 | Traffic warning situation analysis and research method, equipment and readable storage medium |
CN114418071A (en) * | 2022-01-24 | 2022-04-29 | 中国光大银行股份有限公司 | Cyclic neural network training method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107331132A (en) * | 2017-08-04 | 2017-11-07 | 深圳航天智慧城市系统技术研究院有限公司 | A kind of method and system of Urban Fires hidden danger dynamic prediction monitoring |
CN107392644A (en) * | 2017-06-19 | 2017-11-24 | 华南理工大学 | A kind of commodity purchasing predicts modeling method |
KR101830522B1 (en) * | 2016-08-22 | 2018-02-21 | 가톨릭대학교 산학협력단 | Method for predicting crime occurrence of prediction target region using big data |
CN109214716A (en) * | 2018-10-17 | 2019-01-15 | 四川佳联众合企业管理咨询有限公司 | Mountain fire risk profile modeling method based on stacking algorithm |
CN109376227A (en) * | 2018-10-29 | 2019-02-22 | 山东大学 | A kind of prison term prediction technique based on multitask artificial neural network |
CN109447331A (en) * | 2018-10-17 | 2019-03-08 | 四川佳联众合企业管理咨询有限公司 | Mountain fire Risk Forecast Method based on stacking algorithm |
CN110008979A (en) * | 2018-12-13 | 2019-07-12 | 阿里巴巴集团控股有限公司 | Abnormal data prediction technique, device, electronic equipment and computer storage medium |
-
2019
- 2019-10-29 CN CN201911035111.5A patent/CN110889536A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101830522B1 (en) * | 2016-08-22 | 2018-02-21 | 가톨릭대학교 산학협력단 | Method for predicting crime occurrence of prediction target region using big data |
CN107392644A (en) * | 2017-06-19 | 2017-11-24 | 华南理工大学 | A kind of commodity purchasing predicts modeling method |
CN107331132A (en) * | 2017-08-04 | 2017-11-07 | 深圳航天智慧城市系统技术研究院有限公司 | A kind of method and system of Urban Fires hidden danger dynamic prediction monitoring |
CN109214716A (en) * | 2018-10-17 | 2019-01-15 | 四川佳联众合企业管理咨询有限公司 | Mountain fire risk profile modeling method based on stacking algorithm |
CN109447331A (en) * | 2018-10-17 | 2019-03-08 | 四川佳联众合企业管理咨询有限公司 | Mountain fire Risk Forecast Method based on stacking algorithm |
CN109376227A (en) * | 2018-10-29 | 2019-02-22 | 山东大学 | A kind of prison term prediction technique based on multitask artificial neural network |
CN110008979A (en) * | 2018-12-13 | 2019-07-12 | 阿里巴巴集团控股有限公司 | Abnormal data prediction technique, device, electronic equipment and computer storage medium |
Non-Patent Citations (6)
Title |
---|
ALIF RIDZUAN KHAIRUDDIN等: ""Comparative Study on Artificial Intelligence Techniques in Crime Forecasting"", 《APPLIED MECHANICS AND MATERIALS》 * |
SOKRATIS PAPADOPOULOS等: ""Short-term electricity load forecasting using time series and ensemble learning methods"", 《2015 IEEE POWER AND ENERGY CONFERENCE AT ILLINOIS (PECI)》 * |
SUHONG KIM等: ""Crime Analysis Through Machine Learning"", 《2018 IEEE 9TH ANNUAL INFORMATION TECHNOLOGY, ELECTRONICS AND MOBILE COMMUNICATION CONFERENCE (IEMCON)》 * |
丁红军: ""基于Elman神经网络110警情预测研究"", 《网络安全技术与应用》 * |
赖慧慧: ""大数据背景下基于 ARMA 模型的增值税销项税额预测"", 《税务研究》 * |
陈鹏等: ""基于时间序列模型的110警情数据预测研究"", 《信息系统工程》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111353828A (en) * | 2020-03-30 | 2020-06-30 | 中国工商银行股份有限公司 | Method and device for predicting number of people arriving at store from network |
CN111353828B (en) * | 2020-03-30 | 2023-09-12 | 中国工商银行股份有限公司 | Method and device for predicting number of people coming to store at website |
CN113570846A (en) * | 2021-06-08 | 2021-10-29 | 北京交通大学 | Traffic warning situation analysis and research method, equipment and readable storage medium |
CN113570846B (en) * | 2021-06-08 | 2022-11-04 | 北京交通大学 | Traffic warning situation analysis and judgment method, equipment and readable storage medium |
CN114418071A (en) * | 2022-01-24 | 2022-04-29 | 中国光大银行股份有限公司 | Cyclic neural network training method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ali et al. | A data-driven approach for multi-scale GIS-based building energy modeling for analysis, planning and support decision making | |
CN107844915B (en) | Automatic scheduling method of call center based on traffic prediction | |
Meng et al. | Degree-day based non-domestic building energy analytics and modelling should use building and type specific base temperatures | |
CN109242049B (en) | Water supply pipe network multipoint leakage positioning method and device based on convolutional neural network | |
CN113361665B (en) | Highland mountain tourism safety risk early warning method based on reinforcement learning | |
Liu et al. | Land-use decision support in brownfield redevelopment for urban renewal based on crowdsourced data and a presence-and-background learning (PBL) method | |
CN110889536A (en) | Method and system for predicting and early warning situation | |
CN105469602B (en) | A kind of Forecasting Methodology of the bus passenger waiting time scope based on IC-card data | |
AU2005232219A1 (en) | Forecasting based on geospatial modeling | |
US20110085649A1 (en) | Fluctuation Monitoring Method that Based on the Mid-Layer Data | |
CN105678457A (en) | Method for evaluating user behavior on the basis of position mining | |
CN107992968A (en) | Electric energy meter measurement error Forecasting Methodology based on integrated techniques of teime series analysis | |
CN110889092A (en) | Short-time large-scale activity peripheral track station passenger flow volume prediction method based on track transaction data | |
CN106127333A (en) | Movie attendance Forecasting Methodology and system | |
CN114418175A (en) | Personnel management method and device, electronic equipment and storage medium | |
Willis | Estimating the benefits of job creation from local investment subsidies | |
Dwyer | Cost-benefit analysis | |
Soldatenko et al. | Managing climate risks associated with socio-economic development of the Russian Arctic | |
CN115293465B (en) | Crowd density prediction method and system | |
CN117252305A (en) | House risk assessment method, device, equipment and medium | |
CN116992265A (en) | Carbon emission estimation method, apparatus, device, and storage medium | |
CN112163964B (en) | Risk prediction method, risk prediction device, electronic equipment and storage medium | |
CN108846746A (en) | A kind of carbon transaction behavior modeling method of combination discrete statistics and extreme learning machine | |
CN114493027A (en) | Future talent demand prediction method and system based on Markov model | |
CN112926664A (en) | Feature selection and CART forest short-time strong rainfall forecasting method based on evolutionary algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200317 |