Disclosure of Invention
The invention aims to solve the problem of accurately predicting traffic flow of school events and provides a traffic prediction method, electronic equipment and a storage medium based on school events.
In order to achieve the above purpose, the present invention is realized by the following technical scheme:
a traffic prediction method based on school student events comprises the following steps:
s1, collecting school release event data to construct a school release event set and collecting traffic data for later use;
s2, extracting features of the school release event set obtained in the step S1, and extracting binary features, duration features, geographic features and weight features to obtain school release event set feature vector data;
s3, carrying out data fusion on the characteristic vector data of the school release event set obtained in the step S2 and the traffic data obtained in the step S1 to obtain a traffic data set of the school release event;
s4, constructing a traffic prediction model based on school release events;
s5, dividing the traffic data set of the school release event obtained in the step S3 into a training set and a prediction set, inputting the training set and the prediction set into the traffic prediction model based on the school release event constructed in the step S4 for training, predicting traffic flow by using the trained traffic prediction model based on the school release event, and finally comparing the traffic flow prediction result with actual traffic flow in a test set to calculate mean square error.
Further, in step S1, school clearance event data including time data and place data of a school clearance event are collected through a school calendar, and traffic data before and after the occurrence of the school clearance event including traffic flow data and time data, duration data and coordinate data corresponding to the traffic flow data are collected.
Further, the specific implementation method of the step S2 includes the following steps:
s2.1, extracting binary characteristicsThe binary feature is used to indicate whether a school clearance event exists within a certain period of time,wherein->=1 indicates that there is a school student event, +.>=0 indicates that there is no school clearance event;
s2.2, extracting the time length characteristicsThe duration feature is used to calculate the difference between the start and end times of school events, +.>Wherein->Indicating the start time of school student event, +.>Indicating the end time of the school student event;
s2.3, extracting geographic features, wherein the geographic features are longitude and latitude of a school or region codes, and represent places of school events;
s2.4, extracting weight characteristics, wherein the weight characteristics are used for representing the influence degree of school release events on traffic, and the calculation expression of the weight characteristics is as follows:
weight feature = relative change/total relative change;
the relative change represents the relative change degree of the average traffic flow and the usual traffic flow caused by school student events, and the calculation expression of the relative change is:
relative change = (average traffic flow during school clearance event-average traffic flow in usual)/average traffic flow in usual
The total relative change is the sum of the relative changes of all school student events, and the total relative change is calculated by the following expression:
total relative change = Σ (relative change).
Further, the specific implementation method of the step S3 includes the following steps:
s3.1, aligning the characteristic vector data of the school release event set obtained in the step S2 with the traffic data obtained in the step S1, collecting the school release event time period as [ start_time, end_time ], collecting the time stamp of the traffic data as time stamp, and mapping the school release event time period to the time stamp of the traffic data, wherein the calculation expression is as follows:
wherein,the school clearance event time period corresponding to the traffic data;
s3.2, carrying out data cleaning on the data aligned in the step S3.1, and deleting abnormal values or outliers or inconsistent data;
a threshold is defined by using a statistical method, specifically using the mean value ± 2 standard deviations as the threshold of the outlier, and the treatment is performed by using the following formula, wherein the calculation expression is as follows:
;
wherein,for the original dataset +.>Is the mean value of the dataset, +.>Is the standard deviation of the dataset +.>For the cleaned dataset, +.>Data point coordinates;
s3.3, processing the data cleaned in the step S3.2, namely processing the missing value, and processing according to the degree of the data missing value;
s3.3.1 when the data missing value is less than or equal to 5%, processing by deleting the row or column where the missing value is located, wherein the calculation expression is as follows:
;
s3.3.2 when the data missing value is more than 5%, the linear interpolation method is adopted for processing, and the calculation expression is as follows:
;
wherein,for the interpolated value, (-)>, />) And (/ ->, />) As a result of the known data points,is the position of the missing value, wherein +.>And->Respectively represents a first independent variable and a second independent variable,and->Respectively representing a first dependent variable and a second dependent variable;
s3.3.3 when the missing value of the data is more than 5% and the missing value cannot be filled by interpolation, the specific value filling is adopted for processing, and the calculation expression is as follows:
;
wherein,for the filled data set, +.>For judging element->Whether a function of the missing value.
Further, in step S4, the traffic prediction model based on school release events adopts a linear regression method, and the fitted linear equation correlates the characteristics with the traffic flow through weights and deviations, and the calculation expression is as follows:
;
wherein,yfor traffic flow, p i Is characteristic of the ith school student's event, w i And b is a deviation, wherein the weight corresponds to the characteristic of the i-th school student-playing event.
Further, the specific implementation method of step S5 is to use the features of the training set and the corresponding traffic flow data to perform model training on the traffic prediction model based on the school release event, use the traffic prediction model based on the school release event to perform traffic flow prediction on the features of the test set, and finally compare the prediction result with the actual traffic flow in the test set, and calculate the mean square error, where the calculation formula of the mean square error is:
;
wherein,for mean square error, n is the number of test samples, < >>For the traffic flow predicted by the model,corresponding actual traffic flow in the test set.
Further, in step S5, the features are input into a traffic prediction model based on school student events to predict traffic flow.
The electronic equipment comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the traffic prediction method based on school release events when executing the computer program.
A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method of traffic prediction based on school student events.
The invention has the beneficial effects that:
compared with the traditional method based on historical data, the traffic prediction method based on school student events provided by the invention has the advantage that the external factors in traffic prediction are more comprehensively considered.
The traffic prediction method based on school student events provides an intuitive way for explaining traffic prediction results, and the contribution degree of each school student event to traffic flow is known through weight setting. Such an interpretation helps traffic managers, city planners and decision makers understand the working principles of traffic prediction models and make corresponding adjustments and optimizations as needed.
According to the traffic prediction method based on school student events, the influence of the school student event set is considered and the corresponding weight is given to the school student event set, so that the traffic flow during school student can be predicted more accurately.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and detailed description. It should be understood that the embodiments described herein are for purposes of illustration only and are not intended to limit the invention, i.e., the embodiments described are merely some, but not all, of the embodiments of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein can be arranged and designed in a wide variety of different configurations, and the present invention can have other embodiments as well.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.
For further understanding of the invention, the following detailed description is presented in conjunction with the accompanying drawings 1 to provide a further understanding of the invention in its aspects, features and efficacy:
the first embodiment is as follows:
a traffic prediction method based on school student events comprises the following steps:
s1, collecting school release event data to construct a school release event set and collecting traffic data for later use;
further, in step S1, school clearance event data including time data and place data of a school clearance event are collected through a school calendar, and traffic data before and after the occurrence of the school clearance event including traffic flow data and time data, duration data and coordinate data corresponding to the traffic flow data are collected;
s2, extracting features of the school release event set obtained in the step S1, and extracting binary features, duration features, geographic features and weight features to obtain school release event set feature vector data;
further, the specific implementation method of the step S2 includes the following steps:
s2.1, extracting binary characteristicsThe binary feature is used to indicate whether a school clearance event exists within a certain period of time,wherein->=1 indicates that there is a school student event, +.>=0 indicates that there is no school clearance event;
s2.2, extracting the time length characteristicsThe duration feature is used to calculate the difference between the start and end times of school events, +.>Wherein->Indicating the start time of school student event, +.>Indicating the end time of the school student event;
s2.3, extracting geographic features, wherein the geographic features are longitude and latitude of a school or region codes, and represent places of school events;
further, for each possible value of a geographic feature, a binary feature is created using One-Hot encoding. If there are three schools (school A, school B, school C), three binary features can be created, indicating whether each school belongs to, e.g., a sample belongs to school A, and the corresponding feature value is [1, 0, 0];
s2.4, extracting weight characteristics, wherein the weight characteristics are used for representing the influence degree of school release events on traffic, and the calculation expression of the weight characteristics is as follows:
weight feature = relative change/total relative change;
the relative change represents the relative change degree of the average traffic flow and the usual traffic flow caused by school student events, and the calculation expression of the relative change is:
relative change = (average traffic flow during school clearance event-average traffic flow in usual)/average traffic flow in usual
The total relative change is the sum of the relative changes of all school student events, and the total relative change is calculated by the following expression:
total relative change = Σ (relative change);
s3, carrying out data fusion on the characteristic vector data of the school release event set obtained in the step S2 and the traffic data obtained in the step S1 to obtain a traffic data set of the school release event;
further, the specific implementation method of the step S3 includes the following steps:
s3.1, aligning the characteristic vector data of the school release event set obtained in the step S2 with the traffic data obtained in the step S1, collecting the school release event time period as [ start_time, end_time ], collecting the time stamp of the traffic data as time stamp, and mapping the school release event time period to the time stamp of the traffic data, wherein the calculation expression is as follows:
;
wherein,the school clearance event time period corresponding to the traffic data;
s3.2, carrying out data cleaning on the data aligned in the step S3.1, and deleting abnormal values or outliers or inconsistent data;
a threshold is defined by using a statistical method, specifically using the mean value ± 2 standard deviations as the threshold of the outlier, and the treatment is performed by using the following formula, wherein the calculation expression is as follows:
;
wherein,for the original dataset +.>Is the mean value of the dataset, +.>Is the standard deviation of the dataset +.>For the cleaned dataset, +.>Data point coordinates;
further, the values in the historical traffic data are classified to generate a value list, such as [ flow 1, flow 2, ], flow n ] or [ time 1, time 2, ], time n ];
the calculation formula of the average flow is as follows:
mean= (flow 1+flow 2+, +flow n)/n;
the calculation formula of the average value time is as follows:
mean= (time 1+time 2+, +time n)/n;
the calculation formula of the standard deviation flow is as follows:
std=sqrt (((flow 1-mean)/(2+) (flow 2-mean)/(2+) + (flow n-mean)/(2)/n)
The calculation formula of standard deviation time is:
std=sqrt (((time 1-mean)/(2+) (time 2-mean)/(2+) + (time n-mean)/(2)/n);
s3.3, processing the data cleaned in the step S3.2, namely processing the missing value, and processing according to the degree of the data missing value;
s3.3.1 when the data missing value is less than or equal to 5% and the whole data analysis is not affected, processing by deleting the row or column where the missing value is located, wherein the calculation expression is as follows:
;
s3.3.2 when the data missing value is more than 5%, the linear interpolation method is adopted for processing, and the calculation expression is as follows:
;
wherein,for the interpolated value, (-)>, />) And (/ ->, />) As a result of the known data points,is the position of the missing value, wherein +.>And->Respectively represents a first independent variable and a second independent variable,and->Respectively representing a first dependent variable and a second dependent variable;
s3.3.3 when the missing value of the data is more than 5% and the missing value cannot be filled by interpolation, the specific value filling is adopted for processing, and the calculation expression is as follows:
;
wherein,for the filled data set, +.>For judging element->Whether or not it is a function of the missing value;
s4, constructing a traffic prediction model based on school release events;
further, in step S4, the traffic prediction model based on school release events adopts a linear regression method, and the fitted linear equation correlates the characteristics with the traffic flow through weights and deviations, and the calculation expression is as follows:
;
wherein,yfor traffic flow, p i Is characteristic of the ith school student's event, w i The weight corresponding to the characteristics of the i-th school student-playing event is given, and b is the deviation;
further, the traffic flow corresponds to the congestion situation, and the congestion situation is manually set according to the current traffic flow and is classified into A, B, C three grades;
further, the prediction accuracy of the model is evaluated by using the test data, and the prediction result is compared with the actual situation. Model improvement and optimization are carried out according to the evaluation result, characteristic selection adjustment is carried out, and characteristic deletion and weight adjustment can be carried out according to the actual evaluation result.
S5, dividing the traffic data set of the school release event obtained in the step S3 into a training set and a prediction set, inputting the training set and the prediction set into the traffic prediction model based on the school release event constructed in the step S4, performing traffic flow prediction by using the trained traffic prediction model based on the school release event, and finally comparing the traffic flow prediction result with actual traffic flow in a test set to calculate a mean square error;
further, the specific implementation method of step S5 is to use the features of the training set and the corresponding traffic flow data to perform model training on the traffic prediction model based on the school release event, use the traffic prediction model based on the school release event to perform traffic flow prediction on the features of the test set, and finally compare the prediction result with the actual traffic flow in the test set, and calculate the mean square error, where the calculation formula of the mean square error is:
;
wherein,for mean square error, n is the number of test samples, < >>For the traffic flow predicted by the model,corresponding actual traffic flow in the test set.
Further, in step S5, the features are input into a traffic prediction model based on school student events to predict traffic flow.
The second embodiment is as follows:
the electronic equipment comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the traffic prediction method based on school release events when executing the computer program.
The computer device of the present invention may be a device including a processor and a memory, such as a single chip microcomputer including a central processing unit. And the processor is used for realizing the steps of the traffic prediction method based on school student events when executing the computer program stored in the memory.
The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
And a third specific embodiment:
a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method of traffic prediction based on school student events.
The computer readable storage medium of the present invention may be any form of storage medium that is readable by a processor of a computer device, including but not limited to, nonvolatile memory, volatile memory, ferroelectric memory, etc., on which a computer program is stored, and when the processor of the computer device reads and executes the computer program stored in the memory, the steps of the traffic prediction method based on school student events described above may be implemented.
The computer program comprises computer program code which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.
It is noted that relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Although the present application has been described hereinabove with reference to specific embodiments, various modifications thereof may be made and equivalents may be substituted for elements thereof without departing from the scope of the application. In particular, the features of the embodiments disclosed in this application may be combined with each other in any way as long as there is no structural conflict, and the exhaustive description of these combinations is not given in this specification merely for the sake of omitting the sake of brevity and saving resources. Therefore, it is intended that the present application not be limited to the particular embodiments disclosed, but that the present application include all embodiments falling within the scope of the appended claims.