CN117688825A - Prediction model meshing and event prediction method and system based on true errors - Google Patents
Prediction model meshing and event prediction method and system based on true errors Download PDFInfo
- Publication number
- CN117688825A CN117688825A CN202211099148.6A CN202211099148A CN117688825A CN 117688825 A CN117688825 A CN 117688825A CN 202211099148 A CN202211099148 A CN 202211099148A CN 117688825 A CN117688825 A CN 117688825A
- Authority
- CN
- China
- Prior art keywords
- error
- model
- prediction
- event
- true
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 76
- 238000012549 training Methods 0.000 claims abstract description 9
- 238000009826 distribution Methods 0.000 claims description 18
- 238000004364 calculation method Methods 0.000 claims description 12
- 238000013136 deep learning model Methods 0.000 claims description 7
- 238000005457 optimization Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 238000009827 uniform distribution Methods 0.000 claims 1
- 230000008901 benefit Effects 0.000 description 6
- 230000000875 corresponding effect Effects 0.000 description 6
- 238000000638 solvent extraction Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 238000013439 planning Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- VYMDGNCVAMGZFE-UHFFFAOYSA-N phenylbutazonum Chemical compound O=C1C(CCCC)C(=O)N(C=2C=CC=CC=2)N1C1=CC=CC=C1 VYMDGNCVAMGZFE-UHFFFAOYSA-N 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 238000013316 zoning Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/08—Probabilistic or stochastic CAD
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/10—Numerical modelling
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computational Mathematics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Operations Research (AREA)
- Evolutionary Biology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Algebra (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Computer Hardware Design (AREA)
- Geometry (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a prediction model grid division scheme based on true errors and an event prediction method, which comprise the following steps: step one: dividing the space into a plurality of large areas, dividing the large areas into a plurality of small areas, and training a prediction model by taking the large areas as units; step two: calculating the upper bound of the real error of the current region division according to the prediction result of the model; step three: and minimizing the real error to obtain an optimal regional division scheme, and predicting the event according to the division scheme. The invention also discloses a grid division and event prediction system for realizing the true error of the method. Compared with the traditional traversal method, the method provided by the invention can find the optimal regional division scheme more efficiently under the condition of not sacrificing great accuracy, and can improve the performance of the model-based crowdsourcing algorithm.
Description
Technical Field
The invention belongs to the technical field of space-time data mining, and relates to a prediction model grid division and event prediction method and system based on real errors.
Background
With the development of artificial intelligence technology, a space-time prediction model is increasingly paid attention to academia and industry. In recent years, many studies have proposed a prediction model that predicts the number of events occurring for a period of time in the future in one area. Such as predicting the number of events and predicting the crime rate. The prediction models can effectively improve benefits of crowdsourcing algorithms and effectively reduce crime probability. Most models need to divide the area into a plurality of rectangular grids before predicting the number of events, and then predict the number of events in each grid. Too large a grid area may result in the prediction result not being used in a crowdsourcing algorithm based on a prediction model; while too small a grid area may result in larger errors in the prediction model. Existing meshing schemes typically rely on the experience of the individual researcher or the results of the experiment.
These meshing schemes have the following disadvantages: first, these schemes all assume that the intra-zone event distribution is uniform, which is incorrect in practice; second, the division of the regions by these schemes is mainly dependent on personal experience of researchers or experimental results, and lacks corresponding theoretical support; third, the optimization objective of these schemes is to minimize the prediction error of the model, which does not mean that the benefit of the model in practical applications is maximized.
Disclosure of Invention
In order to solve the defects existing in the prior art, the invention aims to provide a prediction model meshing and event prediction method based on real errors. The invention provides concepts of true errors, model errors, representation errors and the like to improve the benefit of the model applied to a crowdsourcing algorithm based on prediction. By adjusting the size of the meshing to minimize the true error of the model, the predictive model can provide more accurate predictive results for the crowdsourcing algorithm. The method comprises the following specific steps:
dividing a space region according to a specific parameter n, and training a prediction model;
step (2), calculating the upper bound of the real error of the current region division according to the prediction result of the model;
and (3) determining optimal n by minimizing a real error, obtaining a regional division scheme, and carrying out event prediction according to the division scheme.
The specific steps of the step (1) comprise:
step (11), dividing the region into regions with the same sizeLarge area, n is a full square number;
step (12), dividing each large area into the same sizeSmall areas, m is a full square number and satisfies n×m>N;
And (13) training a prediction model based on the historical data of the large area.
In the step (12), N is a super parameter which enables the event distribution in the small area to be uniform; by evenly distributed is meant that the probability of an event occurring anywhere within a small area is equal.
In the step (13), historical data refers to the number of events occurring in each large area in each time period in the past month, wherein one time period is 30 minutes, and the total time period is 48 time periods in a day; the prediction model refers to a deep learning model. In the invention, the prediction model used can be a deep learning model such as a deep ST model or a Dmvst-Net model, and the deep ST model processes the adjacent information, the period information and the trend information of the event quantity change in the historical data by utilizing a convolutional neural network to extract relevant characteristics; the Dmvst-Net model analyzes the historical information through a time module and a space module, the time module processes the historical information of one area by using a circulating neural network, and the space module analyzes the influence of other areas on the time occurrence quantity of the adjacent areas by using the circulating neural network.
The specific steps of the step (2) comprise:
and (21) calculating a model error according to the prediction result of the model, wherein the model error refers to the difference between the prediction value of the small area event number and the estimation value of the small area event number.
A step (22) of efficiently estimating a representation error, which is a difference between an estimated value of the number of small area events and an actual value of the number of small area events, using a numerical optimization method; here, the estimated value of the number of small area events is equal to the actual value of the number of large area events divided by m, i.e. the actual value of the number of large area events is divided equally into the values of m small areas.
The numerical optimization method is characterized in that the calculation process of the joint Poisson distribution is simplified through recursion.
Step (23) represents that the sum of the error and the model error is an upper bound of a true error, wherein the true error is a difference between a predicted value of the small area event number and an actual value of the small area event number.
In step (21), the predicted value of the number of events in the small area is equal to the model predicted value of the number of events occurring in the next time period in the large area divided by m. The estimated value of the number of events for the small area is equal to the actual value of the number of events occurring for the next time period of the large area divided by m.
In the step (23), the real error and the upper bound thereof are firstly reduced and then increased along with the increase of n, and the larger n represents the finer region division, so that the smaller the area of each region is, the larger the randomness of model prediction is, and the larger the model error is. The smaller n (n=1) represents the larger area of the region (directly predicting the order number of the whole city), the smaller the model error at this time, but the prediction result cannot be helpful to other crowdsourcing algorithms at all, and the larger the representation error at this time. The true error can be divided into a sum of the model error and the representation error.
In the invention, the method for minimizing the true error used in the step (3) is a three-component search method and a local iteration method proposed by the invention.
The three-part searching method of the invention utilizes the change trend of the real error which is firstly subtracted and then added to equally divide all the possible values of n into three parts. Discarding all values of the leftmost part when the true error of the left trisection point is greater than the true error of the right trisection point; otherwise, all values of the rightmost part are discarded. The three-way search minimizes the true error by finding the optimal n.
The local iteration method of the invention utilizes a priori knowledge that the optimal value of n is somewhere in the middle. The local iteration method initializes n through priori knowledge and then finds the optimal value in the vicinity of n. If there is no optimal value near n, then n is considered the optimal solution found by the algorithm; otherwise, updating the value of the current optimal solution.
The invention also provides a network division and event prediction system based on true error for realizing the prediction model network division and event prediction method, which comprises the following steps: the system comprises a region dividing module, a prediction model generating module, an error calculating module and an optimizing module;
the region dividing module divides the region into large regions and divides each large region into small regions;
the prediction model generation module is used for generating a prediction model according to the historical data of the large area;
the error calculation module is used for analyzing model errors, representation errors and real errors of the prediction model;
and an optimization module searching n which minimizes the true error.
Compared with the prior art, the invention has the beneficial effects that: compared with the traditional traversing method, the method provided by the invention can more efficiently find the optimal region dividing scheme under the condition of not sacrificing great accuracy; the invention can effectively improve the effect of the prior crowdsourcing algorithm based on the prediction model; the new index, namely the true error, provided by the invention can be used for measuring the accuracy of model prediction; the method provided by the invention has theoretical support.
Drawings
Fig. 1 is a schematic view of region division in an embodiment of the invention.
FIG. 2 is a diagram of the deep ST model according to the embodiment of the present invention.
FIG. 3 is a diagram illustrating error relationships in an embodiment of the present invention.
FIG. 4 is a diagram illustrating a diagram of a data distribution of an order form in a Sichuan market in accordance with an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following specific examples and drawings. The procedures, conditions, experimental methods, etc. for carrying out the present invention are common knowledge and common knowledge in the art, except for the following specific references, and the present invention is not particularly limited.
The invention provides a prediction model grid division scheme and an event prediction method based on true errors, which comprise the following steps: dividing the space into a plurality of large areas, dividing the large areas into a plurality of small areas, and training a prediction model by taking the large areas as units; calculating the upper bound of the real error of the current region division according to the prediction result of the model; and minimizing the real error to obtain an optimal regional division scheme, and predicting the event according to the division scheme. The invention also provides a grid division and event prediction system for realizing the true error of the method. Compared with the traditional traversal method, the method provided by the invention can find the optimal regional division scheme more efficiently under the condition of not sacrificing great accuracy, and can improve the performance of the model-based crowdsourcing algorithm.
The invention provides a prediction model grid division scheme based on real errors, which comprises the following steps:
(1) Dividing a space region according to a specific parameter n, and training a prediction model;
(2) Calculating the upper bound of the real error of the current region division according to the prediction result of the model;
(3) The optimal n is determined by minimizing the true error.
Initializing a parameter n, namely dividing the whole space into n large areas and dividing each large area into m small areas, and initializing n to 1 if a three-division search method is used for minimizing the true error; if a local iterative method is used to minimize the true error, it is initialized to a certain value at the intermediate position. The predictive model is trained for the current region partition and the representation error, model error and true error are calculated. Finally, the optimal n is selected from all possible n by a local iteration method or a three-way search method so as to minimize the real error.
Example 1 zoning
The present invention divides the whole space into n large areas and then divides each large area into m small areas. For a given constant N, N m > N is guaranteed during slicing. The present invention assumes that when N is sufficiently large, the intra-zone event distribution is sufficiently uniform; by uniform, it is meant that the probability of generating events at each location in the next time zone is equal, and does not mean that the number of each location generation in the next time zone is equal. Next, it is explored how to select a suitable N.
First, the region is divided intoAnd the areas. Let alpha ij Representing a small region r ij The number of events expected in the next time period, which can be taken from region r ij The historical event occurrence data is estimated. Given a desired number of events alpha ij Region r of (2) ij And a positive integer K, dividing the region r into ij Dividing into K smaller regions, the number of events expected for each smaller region can be noted as alpha ijk (k=1, 2,) K. Region r ij The event in (2) is considered to be uniformly distributed if and only if +.>The method is true for any K is more than or equal to 1 and less than or equal to K.
Next, the present invention proposes an evaluation indexD α (N) measuring the non-uniformity of the event distribution of the region, D α The larger the value of (N), the more uniform the temporal distribution of the illustrated region:
wherein the method comprises the steps ofNote that as N increases, D α (N) will also increase. However, when N is large enough, the event distribution in each small region is uniform, meaning D α (N) will not increase significantly. At this time, D α (NK)=D α (N) holds for any positive integer K:
when N is large enough, the increase in N will not significantly result in D α An increase in (N), which means D α (N) N may be selected as a suitable index. In other words, one needs to choose a large enough N (typically 128) so that D α (N) maximizing. Finally, the invention provides a sample of the region division.
As shown in fig. 1, the solid line divides the entire space into four large areas to be predicted. The dashed lines divide each large area into four smaller areas, each point in an area representing an event therein. The number of events for each large region can be directly predicted by the deep learning model, and then the number of events for the small region can be estimated by using the number of events for the large region. Because deep learning models generally assume that the interior of a large region is uniformly distributed, the number of events for each small region under a large region is equal. Thus, the number of events per small region is the average of the number of events for large regions. The number in fig. 1 indicates the prediction result of the small region.
Example 2 model training and model error calculation
The present invention uses historical data of orders to train a deep learning model. The order data contains information mainly including the order number, the starting position of the order and the destination of the passenger and the time when the order appears. The specific information of the order is shown in table 1. The order quantity of each area in each time period can be obtained according to the order information. The invention takes 30 minutes as one time period, trains a model through order data of the previous 8 time periods and predicts the order quantity of the next time period. The machine learning model used in the present invention is mainly a regression model. Taking deep st as an example, fig. 2 shows a deep st model structure. Deep st uses historical order information and extracts three types of information from it: proximity, period, and trend. The proximity information refers to information of the first 8 adjacent time periods; the cycle information refers to order information of the same period of the previous 8 days; trend information refers to information for the same period of time for the first 8 weeks. Deep st inputs these three types of information into convolutional neural network and integrates weather and other factors to predict the number of orders in the future.
TABLE 1 order information
The model is denoted as f, trained on historical data information X, and then for each region r i Predicting a next time period zone event numberI.e. < ->(x i Input representing a model). Region r i Historical data of (1) is recorded as X i Therefore there is-> At the same time, each region r i Number of samples of (a)The quantity is marked->The average absolute error of the model f can be defined as MAE (f), and is an error measurement method of the model, and the relation between the model error and the average absolute error is obtained through the following formula;λ i Representation area r i The number of orders actually occurring is then
The representation error can be calculated from the mean absolute error:
wherein E represents the desire.
Example 3 true error, representation error and model error calculation
Dividing the whole space into n large regions { r } 1 ,r 2 ,r 3 ,...,r n A predictive model capable of predicting a large region r i The number of orders that occur inThe number of orders lambda actually occurring in the area i . Each r i Can be further divided into m small regions { r } i1 ,r i2 ,r i3 ,...,r im }. For each small region r ij The number of events occurring in the next time period can be noted as lambda ij Wherein->
Prior message in the absence of event distributionIn the present case, most predictive models consider the distribution of large regions to be uniform, i.e. the number of events per small region within a large region is equal, according to the principle of maximum entropy. Thus, a large region r is obtained i The actual number of events lambda i After that, small region r ij The estimated number of events of (a) can be recorded asWherein->Similarly, in obtaining a large region r i Is>After that, the number of predicted events per small region can be estimated as +.>Wherein->
λ ij ,And->The difference between them can lead to three errors: model errors, representing errors and true errors. Fig. 3 shows the relationship between these three errors. In practice, it is often difficult for a spatio-temporal prediction model to capture the true information of the event distribution. Thus, a large area r i Prediction result->Will be evenly divided into small areas without a priori information. The true error refers to a small region r ij Number of actual events lambda ij And predicted event number->Is a difference in (a) between the two. For a small region r ij Its true error E r (i, j) can be defined as the expectation of a difference in predicted outcome from actual outcome for the number of event occurrences for the next time period. This means +.>Wherein lambda is ij Obeying a given distribution P.
However, in practical cases it is difficult to calculate the true error directly. Thus, the invention is introduced intoTo decompose the true error into a representation error and a model error. For a given small region r ij Its model error Em (i, j) can be defined as the expectation of the difference between the estimated value of the number of occurrences of the next time period and the predicted outcome of the model. This means Wherein lambda is ij Obeying a given distribution P. For a given small region r ij It represents error E e (i, j) can be defined as the expectation of a difference in the estimated result and the actual result of the number of occurrence of events in the next period. This means +.> Wherein lambda is ij Obeying a given distribution P.
Assuming a small region r ij Number of events lambda ij Obeying a desired value of alpha ij Poisson distribution of (a). First, how to calculate the small region r is described ij Representation errors of (2). Due to lambda ij ~P(α ij ) Therefore, it is
(Representing a natural number set
Then define a random variable lambda i,≠j As a large region r i Except for small region r ij Mean of the number of events outside, i.e. lambda i,≠j =∑ g≠j λ i,g And the additivity of poisson distribution is such that lambda i,≠j ~P(∑ g≠j α ig ). Order theCan obtain
Because lambda is ij Andare mutually independent, so->Can be expressed as follows
Here, theRepresenting a small region r ij The number of events is k h And a large area r i The number of events is k h +k m Is a possibility of (1). Then, a calculation formula representing the error can be obtained:
here, thep(r ij ,k h ,k m ) Representing a small region r ij The number of events is k h And a large area r i The number of events is k h +k m In this case, the region r ij The single item indicates that the error is Thus E is e (i, j) can be expressed as a probability weighted average of the single expression errors in all cases. P represents a function and P represents a distribution.
To estimate E e (i, j) a superparameter K can be set toAs E e Approximation of (i, j). Direct calculation of p (r) ij ,k h ,k m ) Is of the temporal complexity O (k) h +k m ) This results in E e (i, j) has a computational complexity of O (m) 2 K 3 ). In reality, however, p (r ij ,k h ,k m ) The calculation of (2) can be reduced to the following:
thus, the error can be represented by a calculation approximated by algorithm 1.
Algorithm 1 represents error calculation
From this algorithm, it can be seen that region r ij Representation error of (c) and alpha ij And m is highly correlated. To minimize the representation error E e (i, j) it is desirable to minimize α ij Or m. However, α ij Is a small region r ij Rather than by the predictive algorithm. For example, alpha ij In the same small region r ij The values of the different time periods are very different. On the other hand, this value will also differ significantly depending on whether the day is weekday or holiday. Then, to minimize the representation error, a suitable N needs to be chosen to minimize m on the premise that the constraint nm > N is satisfied. Since nm > N, in order to make m as small as possible, it is necessary to maximize N.
Example 4 search for optimal partitioning scheme
The sum of the representation error and the model error is the upper bound of the true error. This upper bound is denoted as E u (i,j)=E e (i,j)+E m (i, j). The present invention seeks to find an n that minimizes the sum of the true errors of all regions. However, in practical applications, the true error cannot be directly calculated, and thus the present invention is intended to reduce the true error as much as possible by minimizing the upper bound of the true error. Let the sum of the true error upper bounds of all areas beAlgorithm 2 gives the calculation ∈ ->Is carried out by a method comprising the steps of.
Algorithm 2 error upper bound computation
Because the algorithm 2 needs to train a model to calculate the upper bound of the true error, the time required to execute the algorithm 2 is relatively long. The number of times that the algorithm 2 needs to be executed to find the optimal solution is as follows by a traversing methodIn connection with the analysis of the representation errors and model errors in examples 2 and 3 +.>The trend of (c) should be decreasing and then increasing with increasing n. This means that there is a balance point that minimizes the sum of the representation error and the real error. Consider the extreme case where n=1, which means that the predictive model only needs to predict the number of orders for a period of time in the future for the entire area. The model error is small but the representation error is large; therefore, this predictor does not help in model lifting. While the other extreme, n=n, means that the prediction model needs to predict very specific to the event occurrence location, making the model error large. Therefore, although the representation error at this time is small, the large model error also causes difficulty in applying the prediction result to the actual. Based on the prior information, the invention provides two algorithms, namely a three-way search method or a local iteration method, to find one n so as to ensure thatMinimizing. After selecting a suitable N, the possible range of N is from 1 to N. The main idea of the trisection search method is to trisect all possible values of n and then compare the true errors corresponding to the two trisection points, thereby screening out one third of the possible values. Since n is a complete square number, +.>Is in the range of 1 to->Will->The upper bound of (c) is denoted as r and the lower bound is denoted as l, and the optimal solution is obtained by continuing the range between r and l. The specific steps are initializing l to 1 and r to +.>At each iteration, the trisection point between l and r is noted as m l And m r . If e (m) l )>e(m r ) Let r≡m r This means discarding all possible values to the right of r; let l≡m no l . Until no other candidate values exist between l and r. And finally judging the true errors corresponding to the l and the r, and enabling the n to be equal to the square of the corresponding value with smaller error. Algorithm 3 gives the complete pseudocode. For a given N, use->To represent the complexity of algorithm 3. Because each round will be discarded +>Possible values, thereforeAccording to the principle theorem (the principle theorem is a well-known one in algorithm analysis design), the final +.>
Algorithm 3 three-point search method
Although the three-way search method will find n from the time complexityReduce to->But in some cases it is difficult for the algorithm to find the optimal solution. Therefore, the invention proposes a local search method to find the globally optimal solution with a greater probability. Considering that when n is larger or smaller, the upper bound of the actual error is larger, the global optimum of n tends to be in the middle. Meanwhile, the possible optimal solution can be roughly selected through practical experience, and then a local search is performed with the value as an initial position. The invention sets a search boundary b (typically initialized to 4) and if the error magnitude of the currently initialized location p (typically initialized to 20 x 20) is smaller than all possible values within the boundary, the current location is likely to be the optimal solution. To speed up the search process, algorithm 2 searches from the boundary if there is n better than the current position to avoid the situation +.>The algorithm degrades into a traversal algorithm when monotonically increasing or decreasing. See algorithm 4 for details of this algorithm. />
Algorithm 2 local iteration method
Example 5 search for optimal partitioning scheme
The method and the device can effectively improve the performance of the algorithm based on the prediction model. The invention improves the performance of the corresponding algorithm in the two practical applications of task allocation and path planning. Task allocation aims at sending an order (e.g., a car order or a take order) with location information to a worker (driver or rider) and then completing the order by the worker. The path planning mainly considers passenger orders dynamically appearing on the platform in the carpooling task, and designs a reasonable path for a driver to serve the orders. And the accurate prediction of orders can greatly improve the performance of the corresponding algorithms of task allocation and path planning. POLAR and LS are two types of predictive model-based task assignment algorithms, while DAIF is a predictive model-based path planning algorithm. The invention can find a better dividing scheme for the three algorithms to realize the improvement of the performance. The original n represents the division scheme used in the original work, and the optimal n is the optimal division scheme found by the algorithm proposed by the method. The lifting rate represents the lifting obtained by the optimal division scheme selected by the invention compared with the original scheme. The invention performs experiments on the data set of the western security market, and the distribution of the order data set of the western security market is shown in figure 4. The invention uses the data set of the city of Sian from 10 month 1 day to 10 month 25 days as a training set, and then uses the data set of the city of Sian from 10 month 26 days to 10 month 29 days as a verification set to train a prediction model. The last crowdsourcing algorithm takes a dataset of 10/30/2016 as a test set, which has 109753 orders in total. Table 2 shows that both POLAR and DAIF of the present invention can achieve better results than the original partitioning scheme using the optimal partitioning scheme. The division scheme adopted by LS in the original work is close to the optimal division scheme, so that the effect of LS is not greatly improved.
Table 2 predictive model boost magnitude
The invention not only can promote the effect of the existing algorithm, but also can compare with a traversal method (traversing all possible cases). The invention can find the optimal solution in a short time. The overhead represents the time required for the algorithm to find the solution; the likelihood represents the likelihood that the solution found by the algorithm is the optimal partitioning scheme; OR represents the benefit of the algorithm's found solution divided by the benefit of the optimal partitioning solution. Experiments were performed on new york city, metropolitan and real datasets. Experimental results show that the two algorithms provided by the invention can find a better scheme in a shorter time, and compared with a three-score search method, the local iteration method has a higher probability of finding an optimal score scheme. The trisection search method can be improved by 1.65-2.17% compared with the traversal method, and the local iteration method can be improved by 0.23-1.23% compared with the traversal method.
Table 3 performance of search algorithm
The protection of the present invention is not limited to the above embodiments. Variations and advantages that would occur to one skilled in the art are included within the invention without departing from the spirit and scope of the inventive concept, and the scope of the invention is defined by the appended claims.
Claims (10)
1. The prediction model meshing and event prediction method based on the true error is characterized by comprising the following steps of:
dividing a space region according to a parameter n, and training a prediction model;
step (2), calculating the upper bound of the real error of the current region division according to the prediction result of the model;
and (3) determining optimal n by minimizing a real error, obtaining a regional division scheme, and carrying out event prediction according to the division scheme.
2. The true error based prediction model meshing and event prediction method of claim 1, wherein the specific steps of step (1) include:
step (11) of dividing the region into regions of the same sizeLarge area, n is a full square number;
step (12) of dividing each large area into the same sizeSmall areas, m is a full square number and satisfies n×m>N; n is a super parameter which enables events in a small area to be distributed uniformly; the uniform distribution means that the probability of an event occurring anywhere within a small area is equal;
and (13) training a prediction model based on the historical data of the large area.
3. The true error based prediction model meshing and event prediction method of claim 1, wherein the specific steps of step (2) include:
calculating a model error according to a prediction result of the model, wherein the model error refers to the difference between a prediction value of the small area event number and an estimated value of the small area event number;
estimating a representation error by using a numerical optimization method, wherein the representation error is the difference between an estimated value of the small area event number and an actual value of the small area event number; the numerical optimization method is characterized in that the calculation process of the joint Poisson distribution is simplified through recursion;
step (23), the sum of the representing error and the model error is the upper bound of the real error, wherein the real error refers to the difference between the predicted value of the small area event number and the actual value of the small area event number.
4. The true error based prediction model meshing and event prediction method of claim 1, wherein the method of minimizing true error of step (3) comprises a three-way search method or a local iteration method.
5. The true error-based prediction model meshing and event prediction method according to claim 2, wherein in the step (13), the history data refers to the number of events occurring for each large area for each time period in the past month, wherein one time period is 30 minutes, and the total of 48 time periods is one day; the prediction model refers to a deep learning model.
6. The true error-based prediction model meshing and event prediction method according to claim 5, wherein the prediction model is a deep learning model such as a deep st model or a Dmvst-Net model;
the deep ST model utilizes a convolutional neural network to process the adjacent information, the period information and the trend information of the event quantity change in the historical data, and relevant characteristics are extracted;
the Dmvst-Net model analyzes the historical information through a time module and a space module, the time module processes the historical information of one area by using a circulating neural network, and the space module analyzes the influence of other areas on the time occurrence quantity of the adjacent areas by using the circulating neural network.
7. The true error-based prediction model meshing and event prediction method according to claim 3, wherein in the step (21), the predicted value of the number of events of the small region is equal to the model predicted value of the number of events occurring in the next period of time of the large region divided by m. The estimated value of the number of events for the small area is equal to the actual value of the number of events occurring for the next time period of the large area divided by m.
8. The prediction model meshing and event prediction method based on true errors according to claim 3, wherein in the step (23), the true errors and the upper bounds thereof decrease and then increase as n increases; the larger n represents finer region division, the smaller the area of each region, the larger the randomness of model prediction, the larger the model error, and the smaller the error is; the smaller n represents the larger area of the region, the smaller the randomness of the model prediction, the smaller the model error, and the larger the representation error.
9. The prediction model meshing and event prediction method based on true errors according to claim 4, wherein the three-part search method uses a trend of variation of true errors that is subtracted before added, and equally divides all n possible values into three parts; discarding all values of the leftmost part when the true error of the left trisection point is greater than the true error of the right trisection point; otherwise, discarding all values of the rightmost part; the three-point search method minimizes the real error by searching the optimal n;
and/or the number of the groups of groups,
the local iteration method utilizes priori knowledge, wherein the priori knowledge refers to that the optimal value of n is in a certain position in the middle; initializing n by using a local iteration method through priori knowledge, and then searching an optimal value nearby the n; if there is no optimal value near n, then n is considered the optimal solution found by the algorithm; otherwise, updating the value of the current optimal solution.
10. A true error based meshing and event prediction system employing a true error based prediction model meshing and event prediction method according to any one of claims 1-9, the system comprising: the system comprises a region dividing module, a prediction model generating module, an error calculating module and an optimizing module;
the region dividing module divides the region into large regions and divides each large region into small regions;
the prediction model generation module is used for generating a prediction model according to the historical data of the large area;
the error calculation module is used for analyzing model errors, representation errors and real errors of the prediction model;
and an optimization module searching n which minimizes the true error.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211099148.6A CN117688825A (en) | 2022-09-08 | 2022-09-08 | Prediction model meshing and event prediction method and system based on true errors |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211099148.6A CN117688825A (en) | 2022-09-08 | 2022-09-08 | Prediction model meshing and event prediction method and system based on true errors |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117688825A true CN117688825A (en) | 2024-03-12 |
Family
ID=90128868
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211099148.6A Pending CN117688825A (en) | 2022-09-08 | 2022-09-08 | Prediction model meshing and event prediction method and system based on true errors |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117688825A (en) |
-
2022
- 2022-09-08 CN CN202211099148.6A patent/CN117688825A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109587713B (en) | Network index prediction method and device based on ARIMA model and storage medium | |
CN109508812B (en) | Aircraft track prediction method based on deep memory network | |
CN110458336B (en) | Online appointment vehicle supply and demand prediction method based on deep learning | |
CN110991311A (en) | Target detection method based on dense connection deep network | |
CN115131618B (en) | Semi-supervised image classification method based on causal reasoning | |
CN112270355A (en) | Active safety prediction method based on big data technology and SAE-GRU | |
CN106779219A (en) | A kind of electricity demand forecasting method and system | |
CN109787821B (en) | Intelligent prediction method for large-scale mobile client traffic consumption | |
CN114584406B (en) | Industrial big data privacy protection system and method for federated learning | |
Fitters et al. | Enhancing LSTM prediction of vehicle traffic flow data via outlier correlations | |
CN116187835A (en) | Data-driven-based method and system for estimating theoretical line loss interval of transformer area | |
CN117971475A (en) | Intelligent management method and system for GPU computing force pool | |
CN113205223A (en) | Electric quantity prediction system and prediction method thereof | |
CN113326862A (en) | Audit big data fusion clustering and risk data detection method, medium and equipment | |
CN117875520B (en) | Public safety event prediction method and system based on dynamic graph space-time evolution mining | |
Sardinha et al. | Context-aware demand prediction in bike sharing systems: Incorporating spatial, meteorological and calendrical context | |
CN118094216B (en) | Multi-modal model optimization retrieval training method and storage medium | |
CN116740949B (en) | Urban traffic data prediction method based on continuous learning space-time causal prediction | |
CN117827434A (en) | Mixed elastic telescoping method based on multidimensional resource prediction | |
CN114463978B (en) | Data monitoring method based on track traffic information processing terminal | |
CN117688825A (en) | Prediction model meshing and event prediction method and system based on true errors | |
CN109919219A (en) | A kind of Xgboost multi-angle of view portrait construction method based on Granule Computing ML-kNN | |
CN112967495B (en) | Short-time traffic flow prediction method and system based on big data of movement track | |
CN115081609A (en) | Acceleration method in intelligent decision, terminal equipment and storage medium | |
CN114401496A (en) | Video information rapid processing method based on 5G edge calculation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |