CN114037140A - Prediction model training method, prediction model training device, prediction model data prediction method, prediction model data prediction device, prediction model data prediction equipment and storage medium - Google Patents

Prediction model training method, prediction model training device, prediction model data prediction method, prediction model data prediction device, prediction model data prediction equipment and storage medium Download PDF

Info

Publication number
CN114037140A
CN114037140A CN202111300204.3A CN202111300204A CN114037140A CN 114037140 A CN114037140 A CN 114037140A CN 202111300204 A CN202111300204 A CN 202111300204A CN 114037140 A CN114037140 A CN 114037140A
Authority
CN
China
Prior art keywords
data
passenger flow
time
real
full
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111300204.3A
Other languages
Chinese (zh)
Inventor
陈焕荣
潘希铭
柯凌燕
张茂华
杨德利
高晶
李玮琪
郑任
沈薇
林哲
吴育娇
邓胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Digital Guangdong Network Construction Co Ltd
Original Assignee
Digital Guangdong Network Construction Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Digital Guangdong Network Construction Co Ltd filed Critical Digital Guangdong Network Construction Co Ltd
Priority to CN202111300204.3A priority Critical patent/CN114037140A/en
Publication of CN114037140A publication Critical patent/CN114037140A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Development Economics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Educational Administration (AREA)
  • Remote Sensing (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention discloses a prediction model training method, a data prediction method, a prediction model training device and a storage medium, wherein the prediction model training method comprises the following steps: acquiring full passenger flow sample data of a target geographic area; the full passenger flow sample data comprises historical passenger flow data and prediction correlation factor sample data; the sample data of the forecast related factors comprises at least one of weather data, passenger flow influence date data and passing shift data; and training a preset passenger flow prediction model according to the full passenger flow sample data to obtain a multi-dimensional passenger flow prediction model. The technical scheme of the embodiment of the invention can improve the accuracy of passenger flow data prediction.

Description

Prediction model training method, prediction model training device, prediction model data prediction method, prediction model data prediction device, prediction model data prediction equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of data processing, in particular to a prediction model training method, a prediction model training device, a prediction model data predicting device and a storage medium.
Background
In recent years, passenger flow prediction is widely concerned by various circles, and the passenger flow prediction capability can bring great value in the fields of traffic congestion management, urban planning and the like.
At present, most of regional people flow rate prediction methods are time series prediction methods. Because the traditional linear regression algorithm can only have a better prediction effect on linear time sequence data, algorithms with better nonlinear fitting capability, such as LSTM (Long Short-Term Memory) or XGboost (machine learning function library of gradient lifting algorithm), and the like, are required to be used for predicting nonlinear time sequence data.
In the process of implementing the invention, the traditional linear regression algorithm can only have a better prediction effect on linear time series data, and the following defects exist for the prediction of nonlinear time series data: (1) the characteristic space of passenger flow prediction is large, and the performance of logistic regression is poor; (2) the passenger flow prediction accuracy is low due to the fact that under-fitting is easy to occur; (3) a large number of multi-class features or variables cannot be effectively processed; (4) the non-linear characteristics of passenger flow prediction cannot be effectively utilized. Therefore, the prediction accuracy of the conventional linear regression algorithm for passenger flow prediction is low.
Disclosure of Invention
The embodiment of the invention provides a prediction model training method, a prediction model training device, a prediction model data predicting method, a prediction model data predicting device and a storage medium, which can improve the accuracy of passenger flow data prediction.
In a first aspect, an embodiment of the present invention provides a predictive model training method, including:
acquiring full passenger flow sample data of a target geographic area; the full passenger flow sample data comprises historical passenger flow data and prediction correlation factor sample data; the sample data of the forecast related factors comprises at least one of weather data, passenger flow influence date data and passing shift data;
and training a preset passenger flow prediction model according to the full passenger flow sample data to obtain a multi-dimensional passenger flow prediction model.
In a second aspect, an embodiment of the present invention further provides a data prediction method, including:
acquiring full passenger flow real-time data of a target geographic area; the full passenger flow real-time data comprises real-time passenger flow and prediction related factor real-time data; the real-time data of the forecast related factors comprise at least one of weather data, passenger flow influence date data and passing shift data;
inputting the full passenger flow real-time data into a multi-dimensional passenger flow prediction model to obtain passenger flow prediction data of the target geographic area;
the multi-dimensional passenger flow volume prediction model is obtained by training through the prediction model training method of the first aspect.
In a third aspect, an embodiment of the present invention further provides a prediction model training apparatus, including:
the full passenger flow sample data acquisition module is used for acquiring full passenger flow sample data of the target geographic area; the full passenger flow sample data comprises historical passenger flow data and prediction correlation factor sample data; the sample data of the forecast related factors comprises at least one of weather data, passenger flow influence date data and passing shift data;
and the preset passenger flow prediction model training module is used for training a preset passenger flow prediction model according to the full passenger flow sample data to obtain a multi-dimensional passenger flow prediction model.
In a fourth aspect, an embodiment of the present invention further provides a data prediction apparatus, including:
the system comprises a full passenger flow real-time data acquisition module, a data acquisition module and a data processing module, wherein the full passenger flow real-time data acquisition module is used for acquiring full passenger flow real-time data of a target geographic area; the full passenger flow real-time data comprises real-time passenger flow and prediction related factor real-time data; the real-time data of the forecast related factors comprise at least one of weather data, passenger flow influence date data and passing shift data;
the passenger flow volume prediction data acquisition module is used for inputting the full-volume passenger flow volume real-time data into a multi-dimensional passenger flow volume prediction model to obtain passenger flow volume prediction data of the target geographic area;
the multi-dimensional passenger flow volume prediction model is obtained by training through the prediction model training method of the first aspect.
In a fifth aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a predictive model training method or a data prediction method provided by any of the embodiments of the invention.
In a sixth aspect, an embodiment of the present invention further provides a computer storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the prediction model training method or the data prediction method provided in any embodiment of the present invention.
According to the embodiment of the invention, the multi-dimensional passenger flow prediction model can be obtained by training the preset passenger flow prediction model by using the acquired full passenger flow sample data of the target geographic area, so that the passenger flow prediction data of the target geographic area is predicted by using the multi-dimensional passenger flow prediction model according to the full passenger flow real-time data of the target geographic area. Because the full passenger flow sample data comprises various nonlinear characteristic data such as historical passenger flow data, weather data, passenger flow influence date data, passing shift data and the like, the multi-dimensional passenger flow prediction model can comprehensively consider various passenger flow influence factors to predict the passenger flow, solves the problems of low passenger flow data prediction accuracy and the like in the conventional nonlinear passenger flow prediction method, and improves the passenger flow data prediction accuracy.
Drawings
FIG. 1 is a flowchart of a predictive model training method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a predictive model training method according to a second embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating the effect of normal distribution of data according to the second embodiment of the present invention;
fig. 4 is a schematic diagram illustrating a distribution effect of historical passenger flow data according to a second embodiment of the present invention;
FIG. 5 is a flowchart of a data prediction method according to a third embodiment of the present invention;
fig. 6 is a schematic diagram of a fitting effect of a multi-dimensional passenger flow prediction model on passenger flow according to a third embodiment of the present invention;
FIG. 7 is a schematic diagram of a predictive model training apparatus according to a fourth embodiment of the present invention;
fig. 8 is a schematic diagram of a data prediction apparatus according to a fifth embodiment of the present invention;
fig. 9 is a schematic structural diagram of a computer device according to a sixth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention.
It should be further noted that, for the convenience of description, only some but not all of the relevant aspects of the present invention are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
The terms "first" and "second," and the like in the description and claims of embodiments of the invention and in the drawings, are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not set forth for a listed step or element but may include steps or elements not listed.
Example one
Fig. 1 is a flowchart of a prediction model training method according to an embodiment of the present invention, where the embodiment is applicable to a case where a multidimensional passenger flow prediction model is trained using multidimensional sample data, and the method may be executed by a prediction model training apparatus, which may be implemented by software and/or hardware, and may be generally integrated in a computer device. Accordingly, as shown in fig. 1, the method comprises the following operations:
s110, acquiring full passenger flow volume sample data of the target geographic area; the full passenger flow sample data comprises historical passenger flow data and prediction correlation factor sample data; the sample data of the forecast related factors comprises at least one of weather data, passenger flow influence date data and pass shift data.
The target geographic area may be an area where passenger flow volume prediction is required, such as an airport, a train station, a passenger station, a bus station, a subway station, and the like, and as long as the area where passenger flow volume prediction exists can be used as the target geographic area, the area type of the target geographic area is not limited in the embodiment of the present invention. The full volume passenger flow sample data may be multi-dimensional sample data including historical passenger flow data and passenger flow influencing factors. The sample data of the forecast related factors can be sample data of environmental factors influencing the passenger flow, and can include but is not limited to weather data, passenger flow influence date data, pass shift data and the like. The passenger flow volume influence date data can be date data influencing the passenger flow volume, such as holiday data or other dates with special conditions. The transit shift data may be communication shift information for vehicles in the target geographic area. For example, when the target geographic area is a train station, the communication shift may be shift data for individual trains in the train station.
In the embodiment of the invention, in order to improve the accuracy of the passenger flow prediction model, the full-volume passenger flow sample data including the historical passenger flow data and the sample data of the prediction related factors in the target geographic area can be obtained. Optionally, the historical passenger flow volume data may be determined according to historical user location sample data of the target geographic area, or the historical passenger flow volume data provided by the target geographic area may be directly used. It should be noted that the sample time range of the historical passenger flow volume data may determine the actual data acquisition according to actual service requirements, for example, the historical passenger flow volume data within a half year or within 3 months is acquired, and the sample time range of the historical passenger flow volume data is not limited in the embodiment of the present invention. In addition, the historical passenger flow data may include both incoming passenger flow data and outgoing passenger flow data types of passenger flow statistics.
It is understood that with the development of mobile communication technology, cellular signals cover substantially all corners of a city. Therefore, the handset signaling data is also gradually accumulated. In the embodiment of the present invention, optionally, the historical passenger flow volume data may be determined by using the user position sample data. The user location sample data may adopt operator user signaling location statistics sample data. By analyzing the statistical sample data of the signaling positions of the users of the operators, the travel rules and the travel modes of the users can be identified, and more valuable information of the user movement rules of the target geographic area is obtained through analysis on the basis. Optionally, the operator user signaling location statistics sample data may be obtained through user location snapshot data of the telecommunications operator. Optionally, the user location snapshot data may include, but is not limited to, fields such as a user number, a snapshot time, and an ID of the base station to which the user location snapshot data belongs.
For example, historical passenger flow of the target geographical area may be filtered and counted based on the original location signaling data and in combination with the base station ID corresponding to the target geographical area. Optionally, the historical passenger flow volume may be analyzed according to a certain data statistics period. For example, the real-time entering and real-time leaving passenger flow of the area can be judged and counted according to the rule that whether the user in the area in the current data counting period is in the target geographical area in the previous period and the next period, so that the historical passenger flow time sequence of the target geographical area can be extracted according to the position signaling data. Optionally, the full amount of passenger flow volume sample data may be collected from the original location signaling data according to a preset sample collection time (e.g., a collection period of 5 minutes).
It will be appreciated that the target geographic area will typically have historical traffic data that is statistically based on the users' access to and from the area. Therefore, the historical passenger flow volume data can also directly utilize the real passenger flow volume data counted in the target geographic area, so that the accuracy of sample data is ensured.
Meanwhile, in consideration of the density of sample data, after the full-volume passenger flow sample data of the target geographic area is obtained, the full-volume passenger flow sample data can be sampled according to certain sampling time, and the sampled full-volume passenger flow sample data is used for training a preset passenger flow prediction model. For example, the full-volume passenger flow volume sample data may be sampled in a time period of 30 minutes, and the obtained full-volume passenger flow volume sample data may be sample data with a statistical period of 30 minutes.
In addition, it should be noted that the passenger flow volume influence date data of the holiday dates can provide important reference factors for predicting passenger flow peak time and population movement rules. Meanwhile, the number of passing shifts and weather conditions in the target geographic area can also have an important influence on the passenger flow volume data. Therefore, on the basis of the historical passenger flow volume data, the sample data of the prediction related factors such as weather data, passenger flow volume influence date data and passing shift data need to be comprehensively considered to jointly construct the sample data of the full passenger flow volume, so that the accuracy of the sample data of the full passenger flow volume is ensured.
And S120, training a preset passenger flow prediction model according to the full passenger flow sample data to obtain a multi-dimensional passenger flow prediction model.
The preset passenger flow prediction model can be a pre-constructed passenger flow prediction model, and a mature passenger flow prediction model can be obtained after training. The multi-dimensional passenger flow prediction model can predict future passenger flow according to real-time passenger flow by utilizing multi-dimensional passenger flow influence factors.
Correspondingly, after the full-volume passenger flow sample data of the target geographic area is obtained, the preset passenger flow prediction model can be trained according to the full-volume passenger flow sample data to determine mature model parameters of the preset passenger flow prediction model, and therefore the final multi-dimensional passenger flow prediction model is constructed according to the mature model parameters. For example, assuming that the preset passenger flow prediction model is a neural network model, the training process of the preset passenger flow prediction model may specifically be: and inputting the full passenger flow sample data into the neuron of the preset passenger flow prediction model for forward propagation to obtain a neuron output result. The neuron output result is input to an error function, and compared with an expected value to obtain an error. And further determining a gradient metric through back propagation, and finally adjusting the model parameters through the gradient metric, wherein the adjustment aims at enabling the error corresponding to the output result of the neuron to tend to be 0 or converge. And repeating the process until the set training times or the average value of the errors does not fall any more, and finishing the training. Therefore, the multi-dimensional passenger flow prediction model can comprehensively consider various passenger flow influence factors to predict the passenger flow, and therefore the passenger flow prediction accuracy is guaranteed.
Optionally, after the full passenger flow volume sample data of the target geographic area is obtained, the training set and the test set may be divided according to a time sequence. For example, the first 70% of the data of the full passenger volume sample data is divided into the training set, and the remaining last 30% of the data is divided into the test set. After the multi-dimensional passenger flow prediction model is obtained from the preset passenger flow prediction model by adopting the training set, the multi-dimensional passenger flow prediction model can be further tested by adopting the testing set, so that the model parameters are adjusted according to the test result, and the accuracy of the multi-dimensional passenger flow prediction model is further improved.
According to the embodiment of the invention, the multi-dimensional passenger flow prediction model can be obtained by training the preset passenger flow prediction model by using the acquired full passenger flow sample data of the target geographic area. The multi-dimensional passenger flow prediction model can be used for predicting passenger flow prediction data of the target geographic area according to the full-volume passenger flow real-time data of the target geographic area. Because the full passenger flow sample data comprises various nonlinear characteristic data such as historical passenger flow data, weather data, passenger flow influence date data, passing shift data and the like, the multi-dimensional passenger flow prediction model can comprehensively consider various passenger flow influence factors to predict the passenger flow, solves the problems of low passenger flow data prediction accuracy and the like in the conventional nonlinear passenger flow prediction method, and improves the passenger flow data prediction accuracy.
Example two
Fig. 2 is a flowchart of a prediction model training method provided in the second embodiment of the present invention, which is embodied on the basis of the second embodiment of the present invention, and in this embodiment, a plurality of specific optional operations for performing data preprocessing on full passenger volume sample data after acquiring the full passenger volume sample data of a target geographic area, and a specific optional implementation manner for training a preset passenger volume prediction model according to the full passenger volume sample data and evaluating the model are provided. Correspondingly, as shown in fig. 2, the method of the present embodiment may include:
s210, acquiring full passenger flow sample data of the target geographic area.
The sample data inevitably has missing values and abnormal values, and the abnormal data can negatively influence the final prediction result. Therefore, before training the preset passenger flow prediction model by using the full passenger flow sample data, a data preprocessing process needs to be performed on the full passenger flow sample data, including preprocessing operations such as removing abnormal values and filling missing values, so as to improve the availability of the full passenger flow sample data. The data preprocessing process may specifically refer to the following steps S220 to S230.
And S220, performing data filling on the full passenger flow volume sample data under the condition that the first abnormal sample data exists in the full passenger flow volume sample data.
The first exception sample data may be exception sample data that needs to be data-filled.
In the embodiment of the invention, if the first abnormal sample data exists in the full passenger flow sample data, the sample data which is missing in the full passenger flow sample data is indicated. For example, when the time interval between the current sample data and the previous adjacent data is large, or the current sample data in the current statistical period is empty, the current sample data may be determined as the first abnormal sample data. Correspondingly, if it is determined that the first abnormal sample data exists in the full-volume passenger flow sample data, data filling processing can be performed on the full-volume passenger flow sample data, and particularly data filling processing can be performed on the first abnormal sample data.
In an optional embodiment of the present invention, the data population on the full passenger volume sample data may include: acquiring sample acquisition time corresponding to the full passenger flow volume sample data; and performing data filling on the full passenger flow volume sample data according to the sample acquisition time and a preset data filling rule.
The sample collection time may be the collection time of the full passenger flow volume sample data. For example, the original location signaling data is collected according to a 5 minute collection period, and the historical passenger flow volume data is determined according to the collected location signaling data. Or, the statistical original historical passenger flow volume data is collected according to a 5-minute collection period, so that the historical passenger flow volume data is obtained. The preset data filling rule may be a preset data filling rule, and is used for filling missing values in the full passenger flow volume sample data.
Specifically, when the full-volume passenger flow volume sample data with the first abnormal sample data is subjected to data filling, the sample acquisition time when the full-volume passenger flow volume sample data is acquired may be determined, so that the full-volume passenger flow volume sample data is subjected to data filling processing according to the sample acquisition time and a preset data filling rule.
In an optional embodiment of the present invention, the data padding, according to the sample acquisition time and a preset data padding rule, the data padding may include: determining target sample acquisition time corresponding to the first abnormal sample data; sample data corresponding to the target sample acquisition time is empty; determining a time difference between the target sample collection time and a sample collection time at a time prior to the target sample collection time; under the condition that the time difference value is determined to be smaller than or equal to the time difference value threshold, obtaining sample data corresponding to the sample acquisition time of the previous time, and filling the sample data corresponding to the target sample acquisition time according to the sample data corresponding to the sample acquisition time of the previous time; and under the condition that the time difference value is larger than the time difference value threshold value, acquiring sample data corresponding to the sample acquisition time of the interval setting period, and filling the sample data corresponding to the target sample acquisition time according to the sample data corresponding to the sample acquisition time of the interval setting period.
Wherein, the target sample collection time may be a sample collection time in which a missing value exists. The time difference threshold may be set according to the sample collection time and the actual requirement, for example, when the sample collection time is 5 minutes, the time difference threshold may be set to 15 minutes, and the embodiment of the present invention does not limit the specific value of the time difference threshold. The setting period can also be set according to actual requirements, such as 24 hours, and the embodiment of the present invention also does not limit the specific value of the setting period.
Specifically, when data filling is performed on full passenger flow volume sample data, the target sample acquisition time corresponding to the first abnormal sample data may be determined first. In the embodiment of the invention, the target sample acquisition time corresponding to the first abnormal sample data can be determined in a plurality of different ways.
Optionally, the target sample acquisition time corresponding to the first abnormal sample data may be directly determined according to whether sample data exists in each sample acquisition time. For example, assuming that the sample acquisition time is 5 minutes, the sample acquisition times corresponding to 4 pieces of sample data in the full passenger flow volume sample data are respectively: 5 minutes, 10 minutes, 15 minutes, and 20 minutes. If the sample data corresponding to the 10 th minute and the 15 th minute is empty, the sample data may be determined as first abnormal sample data, and the sample acquisition times of the 10 th minute and the 15 th minute may be determined as target sample acquisition times.
Optionally, the target sample acquisition time corresponding to the first abnormal sample data may also be calculated according to the already acquired actually existing sample data analysis. For example, assuming that the sample acquisition time is 5 minutes, the sample acquisition times corresponding to 3 pieces of sample data in the full passenger flow volume sample data are respectively: 5 minutes, 15 minutes and 30 minutes (sample data really exists), the sample data collected at the 15 th minute and the 30 th minute can be determined as the first abnormal sample data. Taking sample data acquired in 15 minutes as first abnormal sample data to specifically explain: a sample acquisition time of 10 th minute between 5 th and 15 th minute may be determined as a target sample acquisition time corresponding to sample data being empty.
Accordingly, after the target sample collection time required to be filled with data is determined, the target sample collection time and the sample collection time before the target sample collection time can be determined, so as to calculate the time difference between the target sample collection time and the sample collection time before the target sample collection time. For example, assuming that the sample acquisition time is 5 minutes, the sample acquisition times corresponding to 4 pieces of sample data in the full passenger flow volume sample data are respectively: 5 minutes, 10 minutes, 15 minutes, and 20 minutes. If the sample data collected at the 10 th and 20 th minutes is empty, the sample collection time at the previous time of the sample data collected at the 10 th minute is 5 th minute. Correspondingly, the sample data corresponding to the sample acquisition time of the sample data acquired in the 10 th minute is the sample data acquired in the 5 th minute. Similarly, the sample acquisition time of the sample data acquired at the 20 th minute previous time is 15 th minute. Correspondingly, the sample data corresponding to the sample acquisition time of the sample data acquired in the 20 th minute is the sample data acquired in the 15 th minute.
After the time difference between the target sample collection time and the sample collection time at the previous moment of the target sample collection time is determined, the data can be filled according to the specific value of the time difference. Optionally, if the time difference is smaller than or equal to the time difference threshold, it indicates that the target sample acquisition time is similar to the sample acquisition time corresponding to the sample acquisition time at the previous time, and the sample data corresponding to the sample acquisition time at the previous time may be directly copied to fill the sample data corresponding to the target sample acquisition time. If the time difference is greater than the time difference threshold, it indicates that the target sample acquisition time is far from the sample acquisition time corresponding to the sample acquisition time at the previous time, at this time, sample data corresponding to the sample acquisition time at the interval set period, for example, sample data at the same time point in the previous 24 hours of the target sample acquisition time, may be acquired, and the sample data corresponding to the sample acquisition time at the interval set period is copied to fill the sample data corresponding to the target sample acquisition time. The benefits of such a process are: the data filling can be performed by utilizing the potential rule of the passenger flow data to the maximum extent, so that the accuracy of filling the data is ensured, and the accuracy of sample data is further ensured.
And S230, under the condition that the second abnormal sample data exists in the full passenger flow volume sample data, performing data cleaning on the full passenger flow volume sample data.
The second abnormal sample data may be abnormal sample data that needs to be data cleaned.
In the embodiment of the invention, if the second abnormal sample data exists in the full passenger flow sample data, the second abnormal sample data indicates that the abnormal sample data exists in the full passenger flow sample data. For example, if the difference between the current sample data and other sample data exceeds a set threshold, the current sample data may be determined as the second abnormal sample data. Correspondingly, if it is determined that the second abnormal sample data exists in the full-volume passenger flow volume sample data, data cleaning processing can be performed on the full-volume passenger flow volume sample data, and specifically, the second abnormal sample data can be deleted.
In an optional embodiment of the present invention, the performing data cleansing on the full passenger volume sample data may include: under the condition that the full passenger flow sample data is determined to be in accordance with normal distribution, calculating the standard deviation of the full passenger flow sample data; carrying out data cleaning on the full passenger volume sample data according to the standard deviation of the full passenger volume sample data; determining benchmark reference data of the full passenger flow volume sample data under the condition that the full passenger flow volume sample data is determined not to be in normal distribution; the benchmark reference data is determined according to the average value of the full passenger flow volume sample data; and performing data cleaning on the full passenger flow volume sample data according to the benchmark reference data.
The benchmark reference data may be a data threshold determined according to the full passenger flow volume sample data, and is used for performing outlier screening on the full passenger flow volume sample data.
It can be understood that usually the batch data has a certain distribution rule. Therefore, when data cleaning is performed on the full-volume passenger flow volume sample data, whether a recyclable distribution rule exists in the full-volume passenger flow volume sample data can be judged firstly. Optionally, if it is determined that the full passenger flow volume sample data is subject to normal distribution, data cleaning may be performed according to the normal distribution condition of the full passenger flow volume sample data. FIG. 3 shows an embodiment of the present inventionThe second embodiment provides a schematic diagram of the effect that data obeys normal distribution. As shown in FIG. 3, the positive distribution diagram is
Figure 67068DEST_PATH_IMAGE001
Figure 740626DEST_PATH_IMAGE002
Representing the standard deviation) principle,
Figure 205105DEST_PATH_IMAGE001
the principle is as follows: if the data exceeds 3 standard deviations, it can be considered an outlier. In positive Tai distribution, positive and negative
Figure 139563DEST_PATH_IMAGE001
Is 99.7%, then the distance
Figure 539452DEST_PATH_IMAGE001
The probability of occurrence of a value other than that is
Figure 559360DEST_PATH_IMAGE003
Belonging to very individual small probability events. Therefore, if it is determined that the full passenger flow volume sample data is subject to normal distribution, the standard deviation of the full passenger flow volume sample data can be calculated, 3 or 6 times of the standard deviation is used as a reference threshold value of data screening, and the sample data of which the data value is smaller than the reference threshold value in the full passenger flow volume sample data is used as abnormal data to be deleted.
Correspondingly, if the sample data of the full passenger flow volume is determined not to be in accordance with the normal distribution, the benchmark reference data of the sample data of the full passenger flow volume can be directly determined. For example, an average value of the total passenger flow volume sample data may be calculated, and a numerical value corresponding to a set multiple (e.g., 3 times or 5 times) of the average value may be used as the reference data. Fig. 4 is a schematic diagram illustrating a distribution effect of historical passenger flow volume data according to a second embodiment of the present invention, as shown in fig. 4, after the benchmark reference data is determined, if it is determined that sample data greater than the benchmark reference data exists in full-volume passenger flow volume sample data, which indicates that the sample data has an excessively large fluctuation range and the data may have an abnormality, the sample data may be deleted as abnormal data.
Besides the above data cleaning method, other methods may be adopted to clean data, such as a graph method like a box plot or a modeling method like linear regression, clustering algorithm, and K nearest neighbor algorithm, which is not limited in the embodiment of the present invention.
It should be noted that fig. 2 is only a schematic diagram of an implementation manner, and step S220 may be implemented first, and then step S230 is implemented, or step S220 may be implemented only in real time, or step S230 may be directly implemented by skipping step S220, or both steps may not be implemented, which is not limited in the embodiment of the present invention.
S240, training a preset passenger flow prediction model according to the full passenger flow sample data to obtain a multi-dimensional passenger flow prediction model.
It can be appreciated that after data preprocessing, the full amount of passenger flow sample data is already initially conditioned for modeling. However, in order to further dig the data sample and find out the features more suitable for the actual service, the existing features of the full passenger flow volume sample data need to be derived to obtain the better features of the sample data.
Specifically, the passenger flow volume x (k) of the target geographic area at the current time is related to the time period of the target geographic area at the current date in the dimension, is related to the day of the week to which the target geographic area belongs in the dimension of the current week, is related to the date of the target geographic area in the dimension of the current month, and is related to whether the target geographic area is a holiday or not. In addition, the passenger flow volume of the target geographic area at the current time is also related to the horizontal and vertical variation trends thereof, namely, the passenger flow volume p times before the current time is also related to the historical passenger flow volume. Therefore, the time-series characteristic variable of the passenger flow volume sample data can be described as:
Figure 827531DEST_PATH_IMAGE004
and finally, combining the passing shift information and the weather information of the target geographic area with the time sequence characteristic variable to obtain the final time sequence characteristic variable of the full passenger flow sample data:
Figure 993326DEST_PATH_IMAGE005
accordingly, in an optional embodiment of the present invention, the preset passenger volume prediction model may be expressed based on the following formula:
Figure 954329DEST_PATH_IMAGE006
wherein ^ y represents the passenger flow predicted value of the target geographic area at the time k, x represents the time sequence characteristic variable of the full-volume passenger flow sample data and is also the multi-dimensional input characteristic of a preset passenger flow prediction model, k represents time, n represents the target geographic area identifier, time (n) represents the historical passenger flow time of the target geographic area, weekday (n) and isHoliday (n) represent the passenger flow influence date data (the weekend of working days, whether legal holidays exist or not), yn(k-p) represents the historical passenger flow at the k-p th time, yn(k-p +1) represents the historical passenger flow at the k-p +1 th time, yn(k-1) represents the historical passenger flow at the k-1 time, arrivecnt (k) represents the passenger flow reaching the target geographical area at the k time, leavecnt (k) represents the passenger flow leaving the target geographical area at the k time, arrivenum (k) represents the number of passing shifts reaching the target geographical area at the k time, leavenum (k) represents the number of passing shifts leaving the target geographical area at the k time, istfyj (n), isbyyj (n) and isdwyj (n) represent different types of weather warnings, and weather (n) represents weather data.
In the embodiment of the invention, the preset passenger flow prediction model can predict the passenger flow prediction value of the target geographic area at the time k by using the idea of an XGboost (eXtreme Gradient Boosting) algorithm. The specific idea of the XGboost algorithm is as follows: trees are continually added to fit the residuals of the previous round of prediction while feature splitting is performed to grow a tree. When the training is completed to obtain k trees, the score of a sample is to be predicted, namely, according to the characteristics of the sample, a corresponding leaf node is fallen in each tree, each leaf node corresponds to a score, and finally, the predicted value of the sample is obtained by only adding the scores corresponding to each tree.
In an optional embodiment of the present invention, the training of the preset passenger flow prediction model according to the full passenger flow sample data may include: training a preset passenger flow prediction model according to a target function and the full passenger flow sample data; wherein the objective function may be expressed based on the following formula:
Figure 461534DEST_PATH_IMAGE007
wherein J (F) represents the objective function, the objective of which may be to minimize the error between the passenger flow test and the actual passenger flow value, L (F) and
Figure 877603DEST_PATH_IMAGE008
represents the training loss function, Ω (F) and Ω (F)t) The regular terms are expressed, the phenomenon of prediction data overfitting can be reduced, gamma represents a weight parameter and can be configured according to actual requirements, T represents the number of leaf nodes of a tree in an extreme gradient lifting algorithm, wjRepresenting each leaf node weight. After the ith iteration, the predicted value of the preset passenger flow prediction model is the sum of the predicted value of the preset passenger flow prediction model for i-1 times and the predicted value of the t-th tree, so that the method comprises the following steps:
Figure 52232DEST_PATH_IMAGE009
in order to solve the optimal solution to the objective function, a second-order taylor expansion formula can be introduced to deform the objective function. Accordingly, j (f) may be specifically modified to:
Figure 652978DEST_PATH_IMAGE010
further deductions can be made from the above formula:
Figure 788424DEST_PATH_IMAGE011
wherein:
Figure 132818DEST_PATH_IMAGE012
finally, w can be foundjIs optimally solved as
Figure 37320DEST_PATH_IMAGE013
At this time, the numerical value of the objective function is minimum, and the error between the passenger flow prediction value and the actual value is minimum, so that the passenger flow prediction model is preset to have an optimal solution.
Correspondingly, training can be performed by utilizing the time sequence characteristic variable data of the full-volume passenger flow sample data according to the preset passenger flow prediction model, so that the multi-dimensional passenger flow prediction model is obtained.
And S250, determining full passenger flow volume test set sample data and passenger flow volume real data corresponding to the full passenger flow volume test set sample data according to the full passenger flow volume sample data.
The full passenger flow volume test set sample data may be test set data divided according to the full passenger flow volume sample data, for example, 30% of the full passenger flow volume sample data is used as the full passenger flow volume test set sample data.
In order to objectively evaluate the prediction effect of the multi-dimensional passenger flow prediction model, full passenger flow sample data can be divided into full passenger flow test set sample data in a certain proportion, and passenger flow real data corresponding to the full passenger flow test set sample data is obtained at the same time, so that the prediction data obtained by the full passenger flow test set sample data test and the passenger flow real data are compared and calculated.
And S260, inputting the full passenger flow test set sample data into the multi-dimensional passenger flow prediction model to obtain a prediction data test result.
The prediction data test result can be a test result obtained by predicting passenger flow of the sample data of the full passenger flow test set by using a multi-dimensional passenger flow prediction model.
Specifically, the full passenger flow volume test set sample data can be input into the multidimensional passenger flow volume prediction model, so that a prediction data test result is obtained through the multidimensional passenger flow volume prediction model.
S270, calculating a prediction error value between the prediction data test result and the passenger flow volume real data; wherein the prediction error value comprises a reliability error value and/or a relative mean deviation error value.
And S280, evaluating the multi-dimensional passenger flow prediction model according to the prediction error value.
The prediction error value may be a difference value between the prediction data test result and the passenger flow volume real data. The reliability error value may represent a degree of reliability of the predicted data test result, and the relative average deviation error value may represent a relative average deviation degree of the predicted data test result.
Correspondingly, after the prediction data test result is obtained, the reliability error value and/or the relative average deviation error value and other prediction error values between the prediction data test result and the passenger flow volume real data can be calculated, so that the multi-dimensional passenger flow volume prediction model is evaluated according to the prediction error values. It is understood that a smaller numerical value of the degree of reliability and the relative average degree of deviation indicates a higher degree of prediction reliability and a lower degree of prediction relative deviation.
Alternatively, the average absolute percentage error between the predicted data test result and the passenger flow volume real data may be calculated as the reliability error value. The mean square percentage error between the predicted data test results and the passenger flow volume real data can be calculated as a relative average deviation error value.
Alternatively, the average absolute percentage error may be calculated based on the following equation:
Figure 74546DEST_PATH_IMAGE014
alternatively, the mean square percentage error may be calculated based on the following equation:
Figure 556343DEST_PATH_IMAGE015
in the above formula, PMAPEDenotes the mean absolute percentage error, PMSPRepresenting the mean square percent error, yiThe real data of the passenger flow volume is shown,
Figure 78329DEST_PATH_IMAGE016
representing the predicted data test results and N representing the number of predicted data test results.
According to the embodiment of the invention, the acquired full passenger flow volume sample data of the target geographic area is subjected to data preprocessing such as data filling and data cleaning, the preset passenger flow volume prediction model is trained by using the full passenger flow volume sample data after data preprocessing, the multi-dimensional passenger flow volume prediction model is obtained, the accuracy of the multi-dimensional passenger flow volume prediction model can be improved, and the passenger flow data prediction accuracy is further improved.
It should be noted that any permutation and combination between the technical features in the above embodiments also belong to the scope of the present invention.
EXAMPLE III
Fig. 5 is a flowchart of a data prediction method according to a third embodiment of the present invention, where this embodiment is applicable to a case where passenger flow data is predicted by using a multi-dimensional passenger flow prediction model, and the method may be executed by a data prediction apparatus, and the apparatus may be implemented by software and/or hardware, and may be generally integrated in a computer device. Accordingly, as shown in fig. 5, the method includes the following operations:
s310, acquiring real-time data of the full passenger flow of the target geographic area; the full passenger flow real-time data comprises real-time passenger flow and prediction related factor real-time data; the real-time data of the forecast related factors comprises at least one of weather data, passenger flow influence date data and traffic shift data.
The full-volume passenger flow real-time data can be multidimensional real-time data comprising real-time passenger flow data and passenger flow influence factors. The forecast associated factor real-time data may be real-time data of environmental factors affecting passenger flow, and may include, but is not limited to, weather data, passenger flow influence date data, and transit shift data.
In the embodiment of the invention, after the multi-dimensional passenger flow prediction model is obtained through pre-training, the passenger flow of the target geographic area can be predicted by using the multi-dimensional passenger flow prediction model. Correspondingly, full-volume passenger flow volume real-time data of prediction related factor real-time data of the target geographic area, including real-time passenger flow volume and weather data, passenger flow volume influence date data, pass shift data and the like, are acquired.
In an optional embodiment of the present invention, acquiring real-time passenger flow data of a target geographic area may include: acquiring the real-time position of the user in the target geographic area; and determining the real-time passenger flow volume data according to the real-time position of the user.
The user real-time position can be determined by the related data which can determine the user real-time position through user position signaling data and the like.
In the embodiment of the present invention, optionally, the real-time passenger flow volume data may be determined by using the user position real-time data, and specifically, the real-time passenger flow volume time sequence of the target geographic area may be extracted from the original real-time user position signaling data based on the user position snapshot data of the operator, that is, the data set of the real-time passenger flow volume is obtained. Alternatively, the user location signaling data may be collected according to a 5 minute collection period. Meanwhile, multi-dimensional information such as short-term weather data, holiday information, the number of communication tools and the like of the target geographic area needs to be acquired in real time, and multi-dimensional real-time data is established by combining the real-time passenger flow volume data.
Table 1 is a target geographic area real-time passenger flow table provided in the third embodiment of the present invention, table 2 is a target geographic area communication shift information table provided in the third embodiment of the present invention, and table 3 is a weather information summary table of a target geographic area provided in the third embodiment of the present invention. In one specific example, as shown in tables 1-3, the real-time data of the full passenger volume may be collected according to a certain data collection period (e.g., 5 minutes). Table 4 is a list of the real-time data of the full passenger traffic volume obtained by sampling the real-time data of the full passenger traffic volume according to the third embodiment of the present invention. As shown in table 4, after acquiring the full passenger flow volume real-time data, the full passenger flow volume real-time data may be further sampled to obtain real-time input data of the multi-dimensional passenger flow volume prediction model.
TABLE 1 real-time passenger flow sheet for target geographical area
Figure 961971DEST_PATH_IMAGE017
TABLE 2 communication shift information Table for target geographical area
Figure 170098DEST_PATH_IMAGE018
TABLE 3 summary of weather information for target geographic areas
Figure 748978DEST_PATH_IMAGE019
TABLE 4 full passenger volume real-time data List
Figure 700754DEST_PATH_IMAGE020
In table 4, user _ cnt indicates the real-time resident number, user _ in indicates the real-time inflowing number, user _ out indicates the real-time outflowing number, weather _ type _12 indicates the 12-hour forecast weather, fltno _ in indicates the number of incoming shifts, and fltno _ out indicates the number of outgoing shifts. As shown in table 4, the real-time data of the full passenger flow volume can be sampled at a half-hour period to obtain the real-time input data of the model.
In an optional embodiment of the present invention, after the acquiring the real-time data of the full passenger volume in the target geographic area, the method may further include: under the condition that first abnormal real-time data exist in the full passenger flow real-time data, performing data filling on the full passenger flow real-time data; and/or performing data cleaning on the full passenger flow volume real-time data under the condition that the second abnormal real-time data exists in the full passenger flow volume real-time data.
The first abnormal real-time data may be abnormal real-time data that needs to be data-filled. The second anomalous real-time data may be anomalous real-time data that requires data cleansing.
In the embodiment of the invention, the mode of carrying out data processing on the full passenger flow real-time data is the same as the mode of carrying out data preprocessing on the full passenger flow sample data in the model training process. Specifically, if it is determined that the first abnormal real-time data exists in the full passenger flow real-time data, it indicates that the missing real-time data exists in the full passenger flow real-time data. For example, when the time interval between the current real-time data and the previous adjacent data is large, or the current real-time data in the current statistical period is empty, the current real-time data may be determined as the first abnormal real-time data. Correspondingly, if it is determined that the first abnormal real-time data exists in the full passenger flow real-time data, the data filling processing can be performed on the full passenger flow real-time data, and specifically, the data filling processing can be performed on the first abnormal real-time data. And if the second abnormal real-time data exists in the full passenger flow real-time data, indicating that the abnormal real-time data exists in the full passenger flow real-time data. For example, if the difference between the current real-time data and the other real-time data exceeds a set threshold, the current real-time data may be determined as the second abnormal real-time data. Correspondingly, if it is determined that the second abnormal real-time data exists in the full passenger flow real-time data, the data cleaning processing may be performed on the full passenger flow real-time data, and specifically, the second abnormal real-time data may be deleted.
In an optional embodiment of the present invention, the data padding for the full passenger volume real-time data may include: acquiring real-time acquisition time corresponding to the first abnormal real-time data; and carrying out data filling on the full passenger flow real-time data according to the real-time acquisition time and a preset data filling rule.
Wherein, the real-time acquisition time can be the acquisition time of the real-time data of the full passenger flow. For example, the original location signaling data is collected according to a 5 minute collection period, and the historical passenger flow volume data is determined according to the collected location signaling data. Or, the statistical original historical passenger flow volume data is collected according to a 5-minute collection period, so that the historical passenger flow volume data is obtained. The preset data filling rule may be a preset data filling rule, and is used for filling missing values in the real-time data of the full passenger flow volume.
Specifically, when the full-passenger-volume real-time data with the first abnormal real-time data is subjected to data filling, the real-time acquisition time when the full-passenger-volume real-time data is acquired may be determined, so that the full-passenger-volume real-time data is subjected to data filling processing according to the real-time acquisition time and a preset data filling rule.
In an optional embodiment of the present invention, the data padding for the full passenger volume real-time data according to the real-time collection time and a preset data padding rule may include: determining target real-time acquisition time corresponding to the first abnormal real-time data; the real-time data corresponding to the target real-time acquisition time is null; determining a time difference value between the target real-time acquisition time and a real-time acquisition time at a previous moment of the target real-time acquisition time; under the condition that the time difference is determined to be smaller than or equal to the time difference threshold, acquiring real-time data corresponding to the real-time acquisition time of the previous moment, and filling the real-time data corresponding to the target real-time acquisition time according to the real-time data corresponding to the real-time acquisition time of the previous moment; and under the condition that the time difference is determined to be larger than the time difference threshold, acquiring real-time data corresponding to the real-time acquisition time of the interval set period, and filling the real-time data corresponding to the target real-time acquisition time according to the real-time data corresponding to the real-time acquisition time of the interval set period.
The target real-time acquisition time may be a real-time acquisition time with a missing value. The time difference threshold may be set according to the real-time acquisition time and the actual requirement, and the time difference threshold may be set to 15 minutes when the real-time acquisition time is 5 minutes, and the embodiment of the present invention does not limit the specific value of the time difference threshold. The setting period can also be set according to actual requirements, such as 24 hours, and the embodiment of the present invention also does not limit the specific value of the setting period.
Specifically, when data filling is performed on the full passenger flow volume real-time data, the target real-time acquisition time corresponding to the first abnormal real-time data may be determined first. In the embodiment of the invention, the target real-time acquisition time corresponding to the first abnormal real-time data can be determined in a plurality of different ways.
Optionally, the target real-time acquisition time corresponding to the first abnormal real-time data may be directly determined according to whether real-time data exists in each real-time acquisition time. For example, assuming that the real-time acquisition time is 5 minutes, the real-time acquisition times corresponding to 4 pieces of real-time data in the full passenger flow real-time data are respectively: 5 minutes, 10 minutes, 15 minutes, and 20 minutes. If the real-time data corresponding to the 10 th minute and the 15 th minute is empty, the first abnormal real-time data may be determined, and the real-time acquisition time of the 10 th minute and the 15 th minute may be determined as the target real-time acquisition time.
Optionally, the target real-time acquisition time corresponding to the first abnormal real-time data may also be calculated according to the acquired actually existing real-time data analysis. For example, assuming that the real-time acquisition time is 5 minutes, the real-time acquisition times corresponding to 3 pieces of real-time data in the full passenger flow real-time data are respectively: 5 minutes, 15 minutes, and 30 minutes (all with real-time data), the real-time data collected at the 15 th and 30 th minutes may be determined as the first abnormal real-time data. The real-time data collected in 15 minutes is used as the first abnormal real-time data to be specifically explained: the real-time acquisition time of the 10 th minute between the 5 th minute and the 15 th minute may be determined as a target real-time acquisition time corresponding to which the real-time data is empty.
Correspondingly, after the target real-time acquisition time needing to be filled with data is determined, the target real-time acquisition time and the previous-moment real-time acquisition time of the target real-time acquisition time can be determined, so that the time difference between the target real-time acquisition time and the previous-moment real-time acquisition time can be calculated. For example, assuming that the real-time acquisition time is 5 minutes, the real-time acquisition times corresponding to 4 pieces of real-time data in the full passenger flow real-time data are respectively: 5 minutes, 10 minutes, 15 minutes, and 20 minutes. If the real-time data collected at the 10 th and 20 th minutes is empty, the real-time data collected at the 10 th minute has a previous time real-time collection time of 5 th minute. Correspondingly, the real-time data corresponding to the real-time acquisition time of the real-time data acquired in the 10 th minute is the real-time data acquired in the 5 th minute. Similarly, the real-time data collected at the 20 th minute has the previous real-time collection time of the 15 th minute. Correspondingly, the real-time data corresponding to the real-time acquisition time of the real-time data acquired at the 20 th minute is the real-time data acquired at the 15 th minute.
After the time difference between the target real-time acquisition time and the real-time acquisition time of the previous moment of the target real-time acquisition time is determined, the data can be filled according to the specific value of the time difference. Optionally, if the time difference is smaller than or equal to the time difference threshold, it indicates that the target real-time acquisition time is close to the real-time acquisition time corresponding to the real-time acquisition time of the previous time, and the real-time data corresponding to the real-time acquisition time of the previous time may be directly copied to fill the real-time data corresponding to the target real-time acquisition time. If the time difference is greater than the time difference threshold, it indicates that the target real-time acquisition time is far away from the real-time acquisition time corresponding to the real-time acquisition time of the previous moment, at this time, the real-time data corresponding to the real-time acquisition time of the interval setting period, such as the real-time data of the same time point in the previous 24 hours of the target real-time acquisition time, can be acquired, and the real-time data corresponding to the real-time acquisition time of the interval setting period is copied to fill the real-time data corresponding to the target real-time acquisition time. The benefits of such a process are: the potential rule of the passenger flow data can be utilized to the maximum extent to fill the data, so that the accuracy of the filled data is ensured, and the accuracy of the real-time data is further ensured.
In a specific example, the operator mobile phone signaling data is taken as the user location data for illustration. Table 5 is a signaling data update frequency list provided in the third embodiment of the present invention, as shown in table 5, the operator mobile phone signaling data is usually updated once in 5 minutes, but due to the problems such as network delay, the update frequency of the signaling data is not fixed, and it can be found through the previous data quality analysis that the difference between adjacent time points of some signaling data is not equal to 5 minutes.
Table 5 signalling data update frequency list
Figure 438903DEST_PATH_IMAGE021
Therefore, for signaling data with an update frequency of not 5min, complement filling is required, and the filling rule needs to be filled by different methods according to different scenes, and the specific filling rule is shown in table 6:
table 6 signalling data complement rules list
Figure 958877DEST_PATH_IMAGE022
In an optional embodiment of the present invention, the performing data cleansing on the full passenger flow volume real-time data may include: under the condition that the full passenger flow volume real-time data are determined to be in accordance with normal distribution, calculating the standard deviation of the full passenger flow volume real-time data; carrying out data cleaning on the full passenger flow volume real-time data according to the standard deviation of the full passenger flow volume real-time data; determining benchmark reference data of the full passenger flow volume real-time data under the condition that the full passenger flow volume real-time data is determined not to be in normal distribution; the benchmark reference data is determined according to the average value of the real-time data of the full passenger flow; and carrying out data cleaning on the real-time data of the full passenger flow according to the benchmark reference data.
The benchmark reference data may be a data threshold determined according to the full passenger flow volume real-time data, and is used for performing outlier screening on the full passenger flow volume real-time data.
It can be understood that usually the batch data has a certain distribution rule. Therefore, when the data of the full passenger flow real-time data is cleaned, whether the full passenger flow real-time data has a recyclable distribution rule or not can be judged firstly. Optionally, if it is determined that the full passenger flow volume real-time data is subject to normal distribution, data cleaning may be performed according to a positive-over distribution condition of the full passenger flow volume real-time data, specifically, a standard deviation of the full passenger flow volume real-time data may be calculated, 3 times or 6 times of the standard deviation may be used as a reference threshold for data screening, and real-time data with a data value smaller than the reference threshold in the full passenger flow volume real-time data may be deleted as abnormal data.
Correspondingly, if the real-time data of the full passenger flow is determined not to be in accordance with the normal distribution, the benchmark reference data of the real-time data of the full passenger flow can be directly determined. For example, an average value of the real-time data of the total passenger flow volume may be calculated, and a value corresponding to a set multiple (e.g., 3 times or 5 times) of the average value may be used as the reference data. After the benchmark reference data is determined, if it is determined that the real-time data with the full passenger flow volume is larger than the benchmark reference data, which indicates that the fluctuation range of the real-time data is too large and the data is possibly abnormal, the real-time data can be deleted as abnormal data.
Besides the above data cleaning method, other methods may be adopted to clean data, such as a graph method like a box plot or a modeling method like linear regression, clustering algorithm, and K nearest neighbor algorithm, which is not limited in the embodiment of the present invention.
It should be noted that, the data cleaning process may be performed after the data filling process is performed on the full passenger flow volume real-time data, or only the data filling process may be performed on the full passenger flow volume real-time data, or only the data cleaning process may be performed on the full passenger flow volume real-time data, or the data preprocessing operation may be directly skipped, which is not limited in the embodiment of the present invention.
S320, inputting the full passenger flow real-time data into a multi-dimensional passenger flow prediction model to obtain passenger flow prediction data of the target geographic area.
The multi-dimensional passenger flow prediction model is obtained by training through the prediction model training method in any embodiment. The traffic prediction data may include prediction data of real-time incoming traffic and real-time outgoing traffic for a set period of time (e.g., 3 hours, etc.) in the future for the target geographic area. Alternatively, the prediction data may be output according to a certain prediction period. For example, the prediction period may be 30 minutes, and the like, and may be specifically set according to an actual requirement, which is not limited in the embodiment of the present invention.
Fig. 6 is a schematic diagram of a fitting effect of a multi-dimensional passenger flow prediction model on passenger flow provided by the third embodiment of the present invention. In a specific example, a multi-dimensional passenger flow prediction model is applied to predict passenger flow with a certain airport as a target geographic area, and real passenger flow data obtained subsequently is used for comparison, so that a passenger flow fitting effect as shown in fig. 6 can be obtained. In fig. 6, a solid line represents passenger flow volume prediction data of the multi-dimensional passenger flow volume prediction model, and a dotted line represents actual passenger flow volume data to be subsequently acquired. The effect shown in fig. 6 can show that the fitting effect of the multi-dimensional passenger flow prediction model on the passenger flow is better.
To further verify the accuracy of the multidimensional passenger flow prediction model, the model predicts the future 3-hour data and compares the data with the real values, and the results are shown in the following table 7:
TABLE 7 model predict future 3 hour data and tabulate comparison with true values
Figure 149687DEST_PATH_IMAGE023
According to the data recorded in table 7, the mean square percentage error of the passenger flow prediction data of the multidimensional passenger flow prediction model in the future 3 hours can be further calculated to be 0.0504, and the mean absolute error is 0.0447. Therefore, the multidimensional passenger flow prediction model trained by the embodiment of the invention has higher prediction precision and robustness.
According to the embodiment of the invention, the multi-dimensional passenger flow prediction model can be obtained by training the preset passenger flow prediction model by using the acquired full passenger flow sample data of the target geographic area. The multi-dimensional passenger flow prediction model can be used for predicting passenger flow prediction data of the target geographic area according to the full-volume passenger flow real-time data of the target geographic area. Because the full passenger flow sample data comprises various nonlinear characteristic data such as historical passenger flow data, weather data, passenger flow influence date data, passing shift data and the like, the multi-dimensional passenger flow prediction model can comprehensively consider various passenger flow influence factors to predict the passenger flow, solves the problems of low passenger flow data prediction accuracy and the like in the conventional nonlinear passenger flow prediction method, and improves the passenger flow data prediction accuracy.
Example four
Fig. 7 is a schematic diagram of a predictive model training apparatus according to a fourth embodiment of the present invention, as shown in fig. 7, the apparatus includes: a total passenger flow volume sample data obtaining module 410 and a preset passenger flow volume prediction model training module 420, wherein:
a full passenger flow sample data obtaining module 410, configured to obtain full passenger flow sample data of the target geographic area; the full passenger flow sample data comprises historical passenger flow data and prediction correlation factor sample data; the sample data of the forecast related factors comprises at least one of weather data, passenger flow influence date data and passing shift data;
and the preset passenger flow prediction model training module 420 is used for training a preset passenger flow prediction model according to the full passenger flow sample data to obtain a multi-dimensional passenger flow prediction model.
According to the embodiment of the invention, the multi-dimensional passenger flow prediction model can be obtained by training the preset passenger flow prediction model by using the acquired full passenger flow sample data of the target geographic area. The multi-dimensional passenger flow prediction model can be used for predicting passenger flow prediction data of the target geographic area according to the full-volume passenger flow real-time data of the target geographic area. Because the full passenger flow sample data comprises various nonlinear characteristic data such as historical passenger flow data, weather data, passenger flow influence date data, passing shift data and the like, the multi-dimensional passenger flow prediction model can comprehensively consider various passenger flow influence factors to predict the passenger flow, solves the problems of low passenger flow data prediction accuracy and the like in the conventional nonlinear passenger flow prediction method, and improves the passenger flow data prediction accuracy.
Optionally, the prediction model training apparatus may further include: the first data filling module is used for performing data filling on the full passenger flow volume sample data under the condition that the first abnormal sample data exists in the full passenger flow volume sample data; and the first data cleaning module is used for cleaning the data of the full passenger flow volume sample data under the condition that the second abnormal sample data exists in the full passenger flow volume sample data.
Optionally, the first data padding module is specifically configured to: acquiring sample acquisition time corresponding to the full passenger flow volume sample data; and performing data filling on the full passenger flow volume sample data according to the sample acquisition time and a preset data filling rule.
Optionally, the first data padding module is specifically configured to: determining target sample acquisition time corresponding to the first abnormal sample data; sample data corresponding to the target sample acquisition time is empty; determining a time difference between the target sample collection time and a sample collection time at a time prior to the target sample collection time; under the condition that the time difference value is determined to be smaller than or equal to the time difference value threshold, obtaining sample data corresponding to the sample acquisition time of the previous time, and filling the sample data corresponding to the target sample acquisition time according to the sample data corresponding to the sample acquisition time of the previous time; and under the condition that the time difference value is larger than the time difference value threshold value, acquiring sample data corresponding to the sample acquisition time of the interval setting period, and filling the sample data corresponding to the target sample acquisition time according to the sample data corresponding to the sample acquisition time of the interval setting period.
Optionally, the first data cleaning module is specifically configured to: under the condition that the full passenger flow sample data is determined to be in accordance with normal distribution, calculating the standard deviation of the full passenger flow sample data; carrying out data cleaning on the full passenger volume sample data according to the standard deviation of the full passenger volume sample data; determining benchmark reference data of the full passenger flow volume sample data under the condition that the full passenger flow volume sample data is determined not to be in normal distribution; the benchmark reference data is determined according to the average value of the full passenger flow volume sample data; and performing data cleaning on the full passenger flow volume sample data according to the benchmark reference data.
Optionally, the preset passenger flow prediction model is expressed based on the following formula:
Figure 639574DEST_PATH_IMAGE024
Figure 107596DEST_PATH_IMAGE025
wherein ^ y represents the predicted value of the passenger flow of the target geographic area at the time k, x represents the time sequence characteristic variable of the full-volume passenger flow sample data, k represents time, n represents the identification of the target geographic area, time (n) represents the historical passenger flow time of the target geographic area, weekday (n) and isHoliday (n) represent the date data of influence of the passenger flow (weekend on working days, whether legal holidays exist), yn(k-p) represents the historical passenger flow at the k-p th time, yn(k-p +1) represents the historical passenger flow at the k-p +1 th time, yn(k-1) represents the historical passenger flow at the k-1 time, arrivecnt (k) represents the passenger flow reaching the target geographical area at the k time, leavecnt (k) represents the passenger flow leaving the target geographical area at the k time, arrivenum (k) represents the number of passing shifts reaching the target geographical area at the k time, leavenum (k) represents the number of passing shifts leaving the target geographical area at the k time, istfyj (n), isbyyj (n) and isdwyj (n) represent different types of weather warnings, and weather (n) represents weather data.
Optionally, the preset passenger flow prediction model training module 420 is specifically configured to: training a preset passenger flow prediction model according to a target function and the full passenger flow sample data;
wherein the objective function is expressed based on the following formula:
Figure 657526DEST_PATH_IMAGE026
wherein J (F) represents the objective function, L (F) and
Figure 978042DEST_PATH_IMAGE027
represents the training loss function, Ω (F) and Ω (F)t) Representing a regular term, gamma representing a weight parameter, T representing the number of leaf nodes of a tree in an extreme gradient lifting algorithm, wjRepresenting each leaf node weight.
Optionally, the prediction model training apparatus may further include: a model evaluation module to: determining full passenger flow volume test set sample data and passenger flow volume real data corresponding to the full passenger flow volume test set sample data according to the full passenger flow volume sample data; inputting the full passenger flow test set sample data into the multi-dimensional passenger flow prediction model to obtain a prediction data test result; calculating a prediction error value between the prediction data test result and the passenger flow volume real data; wherein the prediction error value comprises a reliability error value and/or a relative mean deviation error value; and evaluating the multi-dimensional passenger flow volume prediction model according to the prediction error value.
The prediction model training device can execute the prediction model training method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to a predictive model training method provided in any embodiment of the present invention.
Since the above-described prediction model training apparatus is an apparatus capable of executing the prediction model training method in the embodiment of the present invention, based on the prediction model training method described in the embodiment of the present invention, those skilled in the art can understand the specific implementation manner of the prediction model training apparatus of the embodiment and various variations thereof, and therefore, how the prediction model training apparatus implements the prediction model training method in the embodiment of the present invention is not described in detail herein. The scope of the present application is not limited to the device used by those skilled in the art to implement the method for training the prediction model in the embodiments of the present invention.
EXAMPLE five
Fig. 8 is a schematic diagram of a data prediction apparatus according to a fifth embodiment of the present invention, as shown in fig. 8, the apparatus includes: a full passenger flow volume real-time data obtaining module 510 and a passenger flow volume prediction data obtaining module 520, wherein:
a full passenger flow volume real-time data obtaining module 510, configured to obtain full passenger flow volume real-time data of a target geographic area; the full passenger flow real-time data comprises real-time passenger flow and prediction related factor real-time data; the real-time data of the forecast related factors comprise at least one of weather data, passenger flow influence date data and passing shift data;
a passenger flow volume prediction data obtaining module 520, configured to input the full-volume passenger flow volume real-time data into a multidimensional passenger flow volume prediction model to obtain passenger flow volume prediction data of the target geographic area;
the multi-dimensional passenger flow prediction model is obtained by training through any one of the prediction model training methods.
According to the embodiment of the invention, the multi-dimensional passenger flow prediction model can be obtained by training the preset passenger flow prediction model by using the acquired full passenger flow sample data of the target geographic area. The multi-dimensional passenger flow prediction model can be used for predicting passenger flow prediction data of the target geographic area according to the full-volume passenger flow real-time data of the target geographic area. Because the full passenger flow sample data comprises various nonlinear characteristic data such as historical passenger flow data, weather data, passenger flow influence date data, passing shift data and the like, the multi-dimensional passenger flow prediction model can comprehensively consider various passenger flow influence factors to predict the passenger flow, solves the problems of low passenger flow data prediction accuracy and the like in the conventional nonlinear passenger flow prediction method, and improves the passenger flow data prediction accuracy.
Optionally, the full passenger flow volume real-time data obtaining module 510 is specifically configured to: acquiring the real-time position of the user in the target geographic area; and determining the real-time passenger flow volume data according to the real-time position of the user.
Optionally, the data prediction apparatus may further include: the second data filling module is used for performing data filling on the full passenger flow volume real-time data under the condition that the first abnormal real-time data exists in the full passenger flow volume real-time data; and the second data cleaning module is used for cleaning the full passenger flow volume real-time data under the condition that the second abnormal real-time data exists in the full passenger flow volume real-time data.
Optionally, the second data padding module is specifically configured to: acquiring real-time acquisition time corresponding to the first abnormal real-time data; and carrying out data filling on the full passenger flow real-time data according to the real-time acquisition time and a preset data filling rule.
Optionally, the second data padding module is specifically configured to: determining target real-time acquisition time corresponding to the first abnormal real-time data; the real-time data corresponding to the target real-time acquisition time is null; determining a time difference value between the target real-time acquisition time and a real-time acquisition time at a previous moment of the target real-time acquisition time; under the condition that the time difference is determined to be smaller than or equal to the time difference threshold, acquiring real-time data corresponding to the real-time acquisition time of the previous moment, and filling the real-time data corresponding to the target real-time acquisition time according to the real-time data corresponding to the real-time acquisition time of the previous moment; and under the condition that the time difference is determined to be larger than the time difference threshold, acquiring real-time data corresponding to the real-time acquisition time of the interval set period, and filling the real-time data corresponding to the target real-time acquisition time according to the real-time data corresponding to the real-time acquisition time of the interval set period.
Optionally, the second data cleansing module is specifically configured to: under the condition that the full passenger flow volume real-time data are determined to be in accordance with normal distribution, calculating the standard deviation of the full passenger flow volume real-time data; carrying out data cleaning on the full passenger flow volume real-time data according to the standard deviation of the full passenger flow volume real-time data; determining benchmark reference data of the full passenger flow volume real-time data under the condition that the full passenger flow volume real-time data is determined not to be in normal distribution; the benchmark reference data is determined according to the average value of the real-time data of the full passenger flow; and carrying out data cleaning on the real-time data of the full passenger flow according to the benchmark reference data.
The data prediction device can execute the data prediction method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For details of the data prediction method provided in any embodiment of the present invention, reference may be made to the technical details not described in detail in this embodiment.
Since the data prediction apparatus described above is an apparatus capable of executing the data prediction method in the embodiment of the present invention, based on the data prediction method described in the embodiment of the present invention, a person skilled in the art can understand the specific implementation of the data prediction apparatus in the embodiment of the present invention and various variations thereof, and therefore, how the data prediction apparatus implements the data prediction method in the embodiment of the present invention is not described in detail herein. The scope of the present application is not limited to the embodiments of the present invention, and the embodiments of the present invention are only examples of the apparatus used by those skilled in the art.
EXAMPLE six
Fig. 9 is a schematic structural diagram of a computer device according to a sixth embodiment of the present invention. FIG. 9 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in fig. 9 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present invention.
As shown in FIG. 9, computer device 12 is in the form of a general purpose computing device. The components of computer device 12 may include, but are not limited to: one or more processors 16, a memory 28, and a bus 18 that connects the various system components (including the memory 28 and the processors 16).
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an enhanced ISA bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus.
Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
Memory 28 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 30 and/or cache Memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 9, and commonly referred to as a "hard drive"). Although not shown in FIG. 9, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a Compact disk-Read Only Memory (CD-ROM), a Digital Video disk (DVD-ROM), or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an Input/Output (I/O) interface 22. Moreover, computer device 12 may also communicate with one or more networks (e.g., a Local Area Network (LAN), Wide Area Network (WAN)) and/or a public Network (e.g., the Internet) via Network adapter 20. As shown, network adapter 20 communicates with the other modules of computer device 12 via bus 18. It should be appreciated that although not shown in FIG. 9, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, Redundant processing units, external disk drive Arrays, (Redundant Arrays of Independent Disks, RAID) systems, tape drives, and data backup storage systems, to name a few.
The processor 16 executes various functional applications and data processing by running programs stored in the memory 28, so as to implement the prediction model training method provided by the embodiment of the present invention: acquiring full passenger flow sample data of a target geographic area; the full passenger flow sample data comprises historical passenger flow data and prediction correlation factor sample data; the sample data of the forecast related factors comprises at least one of weather data, passenger flow influence date data and passing shift data; and training a preset passenger flow prediction model according to the full passenger flow sample data to obtain a multi-dimensional passenger flow prediction model.
EXAMPLE seven
A seventh embodiment of the present invention is an apparatus for executing a data prediction method according to any embodiment of the present invention, where the apparatus includes: one or more processors; storage means for storing one or more programs; when the one or more programs are executed by the one or more processors, cause the one or more processors to implement a data prediction method as provided by any embodiment of the invention: acquiring full passenger flow real-time data of a target geographic area; the full passenger flow real-time data comprises real-time passenger flow data and prediction related factor real-time data; the real-time data of the forecast related factors comprise at least one of weather data, passenger flow influence date data and passing shift data; inputting the full passenger flow real-time data into a multi-dimensional passenger flow prediction model to obtain passenger flow prediction data of the target geographic area; the multi-dimensional passenger flow prediction model is obtained by training through the prediction model training method in any embodiment. The specific structure and the details thereof can be referred to fig. 9 and the sixth embodiment.
Example eight
An eighth embodiment of the present invention further provides a computer storage medium storing a computer program, which when executed by a computer processor is configured to perform the predictive model training method according to any one of the above embodiments of the present invention: acquiring full passenger flow sample data of a target geographic area; the full passenger flow sample data comprises historical passenger flow data and prediction correlation factor sample data; the sample data of the forecast related factors comprises at least one of weather data, passenger flow influence date data and passing shift data; and training a preset passenger flow prediction model according to the full passenger flow sample data to obtain a multi-dimensional passenger flow prediction model.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM) or flash Memory), an optical fiber, a portable compact disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, Radio Frequency (RF), etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
Example nine
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, can implement the data prediction method described in the foregoing embodiment: acquiring full passenger flow real-time data of a target geographic area; the full passenger flow real-time data comprises real-time passenger flow data and prediction related factor real-time data; the real-time data of the forecast related factors comprise at least one of weather data, passenger flow influence date data and passing shift data; inputting the full passenger flow real-time data into a multi-dimensional passenger flow prediction model to obtain passenger flow prediction data of the target geographic area; the multi-dimensional passenger flow prediction model is obtained by training through the prediction model training method in any embodiment. The specific details thereof are described with reference to example eight.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (18)

1. A predictive model training method, comprising:
acquiring full passenger flow sample data of a target geographic area; the full passenger flow sample data comprises historical passenger flow data and prediction correlation factor sample data; the sample data of the forecast related factors comprises at least one of weather data, passenger flow influence date data and passing shift data;
and training a preset passenger flow prediction model according to the full passenger flow sample data to obtain a multi-dimensional passenger flow prediction model.
2. The method of claim 1, further comprising, after said obtaining full volume passenger flow sample data for a target geographic area:
under the condition that the first abnormal sample data exists in the full passenger flow sample data, performing data filling on the full passenger flow sample data; and/or
And under the condition that the second abnormal sample data exists in the full passenger flow sample data, performing data cleaning on the full passenger flow sample data.
3. The method of claim 2, wherein the data populating the full passenger volume sample data comprises:
acquiring sample acquisition time corresponding to the full passenger flow volume sample data;
and performing data filling on the full passenger flow volume sample data according to the sample acquisition time and a preset data filling rule.
4. The method of claim 3, wherein the data-populating the full passenger volume sample data according to the sample collection time and a preset data-population rule comprises:
determining target sample acquisition time corresponding to the first abnormal sample data; sample data corresponding to the target sample acquisition time is empty;
determining a time difference between the target sample collection time and a sample collection time at a time prior to the target sample collection time;
under the condition that the time difference value is determined to be smaller than or equal to the time difference value threshold, obtaining sample data corresponding to the sample acquisition time of the previous time, and filling the sample data corresponding to the target sample acquisition time according to the sample data corresponding to the sample acquisition time of the previous time;
and under the condition that the time difference value is larger than the time difference value threshold value, acquiring sample data corresponding to the sample acquisition time of the interval setting period, and filling the sample data corresponding to the target sample acquisition time according to the sample data corresponding to the sample acquisition time of the interval setting period.
5. The method of claim 2, wherein the data cleansing of the full passenger volume sample data comprises:
under the condition that the full passenger flow sample data is determined to be in accordance with normal distribution, calculating the standard deviation of the full passenger flow sample data;
carrying out data cleaning on the full passenger volume sample data according to the standard deviation of the full passenger volume sample data;
determining benchmark reference data of the full passenger flow volume sample data under the condition that the full passenger flow volume sample data is determined not to be in normal distribution; the benchmark reference data is determined according to the average value of the full passenger flow volume sample data;
and performing data cleaning on the full passenger flow volume sample data according to the benchmark reference data.
6. The method according to claim 1, characterized in that the preset passenger flow prediction model is expressed based on the following formula:
Figure 162121DEST_PATH_IMAGE001
Figure 344841DEST_PATH_IMAGE002
wherein ^ y represents the predicted value of the passenger flow of the target geographic area at the time k, x represents the time sequence characteristic variable of the full-volume passenger flow sample data, k represents time, n represents the identification of the target geographic area, time (n) represents the historical passenger flow time of the target geographic area, weekday (n) and isHoliday (n) represent the date data of influence of the passenger flow (weekend on working days, whether legal holidays exist), yn(k-p) represents the historical passenger flow at the k-p th time, yn(k-p +1) represents the historical passenger flow at the k-p +1 th time, yn(k-1) represents the historical passenger flow at the k-1 time, arrivecnt (k) represents the passenger flow reaching the target geographical area at the k time, leavecnt (k) represents the passenger flow leaving the target geographical area at the k time, arrivenum (k) represents the number of passing shifts reaching the target geographical area at the k time, leavenum (k) represents the number of passing shifts leaving the target geographical area at the k time, istfyj (n), isbyyj (n) and isdwyj (n) represent different types of weather warnings, and weather (n) represents weather data.
7. The method of claim 1, wherein training a preset passenger flow prediction model based on the full volume passenger flow sample data comprises:
training a preset passenger flow prediction model according to a target function and the full passenger flow sample data;
wherein the objective function is expressed based on the following formula:
Figure 531102DEST_PATH_IMAGE003
Figure 191891DEST_PATH_IMAGE004
wherein J (F) represents the objective function, L (F) and
Figure 412788DEST_PATH_IMAGE005
represents the training loss function, Ω (F) and Ω (F)t) Representing a regular term, gamma representing a weight parameter, T representing the number of leaf nodes of a tree in an extreme gradient lifting algorithm, wjRepresenting each leaf node weight.
8. The method of claim 1, further comprising, after said training a pre-set passenger flow prediction model from said full passenger flow sample data:
determining full passenger flow volume test set sample data and passenger flow volume real data corresponding to the full passenger flow volume test set sample data according to the full passenger flow volume sample data;
inputting the full passenger flow test set sample data into the multi-dimensional passenger flow prediction model to obtain a prediction data test result;
calculating a prediction error value between the prediction data test result and the passenger flow volume real data; wherein the prediction error value comprises a reliability error value and/or a relative mean deviation error value;
and evaluating the multi-dimensional passenger flow volume prediction model according to the prediction error value.
9. A method of data prediction, comprising:
acquiring full passenger flow real-time data of a target geographic area; the full passenger flow real-time data comprises real-time passenger flow data and prediction related factor real-time data; the real-time data of the forecast related factors comprise at least one of weather data, passenger flow influence date data and passing shift data;
inputting the full passenger flow real-time data into a multi-dimensional passenger flow prediction model to obtain passenger flow prediction data of the target geographic area;
the multi-dimensional passenger flow prediction model is obtained by training through the prediction model training method of any one of claims 1 to 8.
10. The method of claim 9, wherein obtaining real-time passenger flow data for a target geographic area comprises:
acquiring the real-time position of the user in the target geographic area;
and determining the real-time passenger flow volume data according to the real-time position of the user.
11. The method of claim 9, further comprising, after said obtaining real-time data of full passenger volume for a target geographic area:
under the condition that first abnormal real-time data exist in the full passenger flow real-time data, performing data filling on the full passenger flow real-time data; and/or
And under the condition that the second abnormal real-time data exists in the full passenger flow real-time data, carrying out data cleaning on the full passenger flow real-time data.
12. The method of claim 11, wherein the data-populating the full passenger volume real-time data comprises:
acquiring real-time acquisition time corresponding to the first abnormal real-time data;
and carrying out data filling on the full passenger flow real-time data according to the real-time acquisition time and a preset data filling rule.
13. The method according to claim 12, wherein the data-populating the real-time full passenger volume data according to the real-time collection time and a preset data-populating rule includes:
determining target real-time acquisition time corresponding to the first abnormal real-time data; the real-time data corresponding to the target real-time acquisition time is null;
determining a time difference value between the target real-time acquisition time and a real-time acquisition time at a previous moment of the target real-time acquisition time;
under the condition that the time difference is determined to be smaller than or equal to the time difference threshold, acquiring real-time data corresponding to the real-time acquisition time of the previous moment, and filling the real-time data corresponding to the target real-time acquisition time according to the real-time data corresponding to the real-time acquisition time of the previous moment;
and under the condition that the time difference is determined to be larger than the time difference threshold, acquiring real-time data corresponding to the real-time acquisition time of the interval set period, and filling the real-time data corresponding to the target real-time acquisition time according to the real-time data corresponding to the real-time acquisition time of the interval set period.
14. The method of claim 11, wherein the data cleansing of the full passenger volume real-time data comprises:
under the condition that the full passenger flow volume real-time data are determined to be in accordance with normal distribution, calculating the standard deviation of the full passenger flow volume real-time data;
carrying out data cleaning on the full passenger flow volume real-time data according to the standard deviation of the full passenger flow volume real-time data;
determining benchmark reference data of the full passenger flow volume real-time data under the condition that the full passenger flow volume real-time data is determined not to be in normal distribution; the benchmark reference data is determined according to the average value of the real-time data of the full passenger flow;
and carrying out data cleaning on the real-time data of the full passenger flow according to the benchmark reference data.
15. A predictive model training apparatus, comprising:
the full passenger flow sample data acquisition module is used for acquiring full passenger flow sample data of the target geographic area; the full passenger flow sample data comprises historical passenger flow data and prediction correlation factor sample data; the sample data of the forecast related factors comprises at least one of weather data, passenger flow influence date data and passing shift data;
and the preset passenger flow prediction model training module is used for training a preset passenger flow prediction model according to the full passenger flow sample data to obtain a multi-dimensional passenger flow prediction model.
16. A data prediction apparatus, comprising:
the system comprises a full passenger flow real-time data acquisition module, a data acquisition module and a data processing module, wherein the full passenger flow real-time data acquisition module is used for acquiring full passenger flow real-time data of a target geographic area; the full passenger flow real-time data comprises real-time passenger flow and prediction related factor real-time data; the real-time data of the forecast related factors comprise at least one of weather data, passenger flow influence date data and passing shift data;
the passenger flow volume prediction data acquisition module is used for inputting the full-volume passenger flow volume real-time data into a multi-dimensional passenger flow volume prediction model to obtain passenger flow volume prediction data of the target geographic area;
the multi-dimensional passenger flow prediction model is obtained by training through the prediction model training method of any one of claims 1 to 8.
17. A computer device, characterized in that the computer device comprises:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a predictive model training method as claimed in any one of claims 1 to 8, or a data prediction method as claimed in any one of claims 9 to 14.
18. A computer storage medium on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out a method of predictive model training according to any one of claims 1 to 8, or a method of data prediction according to any one of claims 9 to 14.
CN202111300204.3A 2021-11-04 2021-11-04 Prediction model training method, prediction model training device, prediction model data prediction method, prediction model data prediction device, prediction model data prediction equipment and storage medium Pending CN114037140A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111300204.3A CN114037140A (en) 2021-11-04 2021-11-04 Prediction model training method, prediction model training device, prediction model data prediction method, prediction model data prediction device, prediction model data prediction equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111300204.3A CN114037140A (en) 2021-11-04 2021-11-04 Prediction model training method, prediction model training device, prediction model data prediction method, prediction model data prediction device, prediction model data prediction equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114037140A true CN114037140A (en) 2022-02-11

Family

ID=80142767

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111300204.3A Pending CN114037140A (en) 2021-11-04 2021-11-04 Prediction model training method, prediction model training device, prediction model data prediction method, prediction model data prediction device, prediction model data prediction equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114037140A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114692760A (en) * 2022-03-30 2022-07-01 中国民航科学技术研究院 Descent rate estimation model construction method, descent rate estimation device and electronic equipment
CN115496105A (en) * 2022-09-28 2022-12-20 广东省新黄埔中医药联合创新研究院 Sleep prediction model training method, sleep condition prediction method and related device
CN116703011A (en) * 2023-08-09 2023-09-05 民航机场规划设计研究总院有限公司 Aviation passenger flow distribution prediction method and device, electronic equipment and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114692760A (en) * 2022-03-30 2022-07-01 中国民航科学技术研究院 Descent rate estimation model construction method, descent rate estimation device and electronic equipment
CN115496105A (en) * 2022-09-28 2022-12-20 广东省新黄埔中医药联合创新研究院 Sleep prediction model training method, sleep condition prediction method and related device
CN115496105B (en) * 2022-09-28 2023-10-24 广东省新黄埔中医药联合创新研究院 Sleep prediction model training method, sleep condition prediction method and related devices
CN116703011A (en) * 2023-08-09 2023-09-05 民航机场规划设计研究总院有限公司 Aviation passenger flow distribution prediction method and device, electronic equipment and storage medium
CN116703011B (en) * 2023-08-09 2023-10-20 民航机场规划设计研究总院有限公司 Aviation passenger flow distribution prediction method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109492830B (en) Mobile pollution source emission concentration prediction method based on time-space deep learning
CN114037140A (en) Prediction model training method, prediction model training device, prediction model data prediction method, prediction model data prediction device, prediction model data prediction equipment and storage medium
WO2017202226A1 (en) Method and device for determining crowd traffic
CN109116444B (en) PCA-kNN-based air quality model PM2.5Forecasting method
CN103747523A (en) User position predicating system and method based on wireless network
CN110738523B (en) Maintenance order quantity prediction method and device
Guo et al. Short-term traffic prediction under normal and incident conditions using singular spectrum analysis and the k-nearest neighbour method
CN111667093B (en) Medium-and-long-term wind power generation calculation method and device
CN112183906B (en) Machine room environment prediction method and system based on multi-model combined model
CN114240719A (en) Air quality missing data filling method and system based on multiple stepwise regression
CN111815098A (en) Traffic information processing method and device based on extreme weather, storage medium and electronic equipment
CN114819289A (en) Prediction method, training method, device, electronic device and storage medium
CN113496314A (en) Method for predicting road traffic flow by neural network model
CN113570867A (en) Urban traffic state prediction method, device, equipment and readable storage medium
CN114066184A (en) Area coverage assessment method and device and electronic equipment
CN113205223A (en) Electric quantity prediction system and prediction method thereof
CN116756825A (en) Group structural performance prediction system for middle-small span bridge
CN113962333A (en) Model training method, fine particulate matter concentration prediction device and electronic equipment
CN111985731B (en) Method and system for predicting number of people at urban public transport station
CN112700065B (en) Business process completion time interval prediction method and system based on deep learning
CN116739317B (en) Mining winch automatic management and dispatching platform, method, equipment and medium
CN111145535B (en) Travel time reliability distribution prediction method under complex scene
CN116629460A (en) Subway passenger flow prediction method based on ST-RANet model
CN114741976B (en) Displacement prediction method, device and equipment and readable storage medium
CN112529311B (en) Road flow prediction method and device based on graph convolution analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination