CN114596702A - Traffic state prediction model construction method and traffic state prediction method - Google Patents

Traffic state prediction model construction method and traffic state prediction method Download PDF

Info

Publication number
CN114596702A
CN114596702A CN202210170462.2A CN202210170462A CN114596702A CN 114596702 A CN114596702 A CN 114596702A CN 202210170462 A CN202210170462 A CN 202210170462A CN 114596702 A CN114596702 A CN 114596702A
Authority
CN
China
Prior art keywords
data
traffic state
prediction model
state prediction
traffic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210170462.2A
Other languages
Chinese (zh)
Other versions
CN114596702B (en
Inventor
杨丽丽
孟繁宇
曾益萍
袁狄平
王倩倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University of Science and Technology
Original Assignee
Southwest University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University of Science and Technology filed Critical Southwest University of Science and Technology
Publication of CN114596702A publication Critical patent/CN114596702A/en
Application granted granted Critical
Publication of CN114596702B publication Critical patent/CN114596702B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0108Measuring and analyzing of parameters relative to traffic conditions based on the source of data
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • G08G1/0129Traffic data processing for creating historical data or processing based on historical data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Traffic Control Systems (AREA)

Abstract

The application is applicable to the technical field of intelligent traffic, and provides a traffic state prediction model construction method and a traffic state prediction method. The model construction method comprises the following steps: acquiring first data, second data and third data, wherein the first data comprise historical traffic state data of a road section to be predicted, the second data comprise historical traffic state data of an upstream intersection and a downstream intersection of the road section to be predicted, the third data comprise characteristics of the first data and characteristics of the second data, and the characteristics comprise spatial characteristics and/or temporal characteristics; fusing the first data, the second data and the third data to obtain training sample data; constructing a traffic state prediction model based on training sample data, and screening out associated features, wherein the associated features are features with the importance degree greater than a preset importance degree; and a final traffic state prediction model is established based on the associated characteristics, so that the prediction precision can be improved, the prediction result is ensured to be consistent with the actual traffic condition, and the condition of the bidirectional traffic flow can be accurately predicted.

Description

Traffic state prediction model construction method and traffic state prediction method
Technical Field
The application belongs to the technical field of intelligent traffic, and particularly relates to a traffic state prediction model construction method and a traffic state prediction method.
Background
At present, in modern traffic management and emergency resource scheduling, the latest road traffic running state information needs to be mastered, so that the whole network traffic state can be known globally, and a decision maker is helped to specify schemes such as traffic jam dispersion, accident disposal, rescue path planning and the like. Generally, the acquisition and visualization of the road network operation state needs to be based on accurate prediction of the traffic states of the road sections and intersections, including the speeds, the flow rates, the passing time and the like of the road sections and the intersections
However, capturing of the spatial correlation of traffic information by the conventional traffic state prediction method generally stays in a correlation matrix estimated or learned through a road network topological structure and historical traffic data, so that a large amount of simplification and strong assumptions need to be made on an actual road network structure, the consideration of actual traffic conditions and secondary/unknown factors is lacked, the prediction precision of the road network operation state can be reduced, and the scheme specified by a decision maker is influenced.
Disclosure of Invention
The embodiment of the application provides a traffic state prediction model construction method and a traffic state prediction method, and the problem of low prediction precision can be solved.
In a first aspect, an embodiment of the present application provides a method for constructing a traffic state prediction model, including:
acquiring first data, second data and third data, wherein the first data comprise historical traffic state data of a road section to be predicted, the second data comprise historical traffic state data of an intersection upstream and downstream of the road section to be predicted, the third data comprise characteristics of the first data and characteristics of the second data, and the characteristics comprise spatial characteristics and/or temporal characteristics;
fusing the first data, the second data and the third data to obtain training sample data;
constructing a traffic state prediction model based on the training sample data, and screening out associated features, wherein the associated features are features with the importance degree greater than a preset importance degree;
and establishing a final traffic state prediction model based on the associated characteristics.
In a possible implementation manner of the first aspect, training a prediction model and screening out associated features specifically includes:
dividing the training sample data into a first test set and a first verification set according to a preset dividing proportion;
setting the characteristics corresponding to the first verification set as decision attributes, setting the characteristics corresponding to the first test set as condition attributes, and establishing and training the traffic state prediction model based on a preset loss function;
in the training process, calculating the importance of the features;
and if the importance is greater than the preset importance, selecting the importance as the associated feature.
Further, calculating the importance of the features specifically includes:
acquiring the score of the feature for improving the traffic state prediction model during each segmentation;
a square weighting of the score is calculated.
In a possible implementation manner of the first aspect, establishing a final traffic state prediction model based on the associated features specifically includes:
establishing an initial traffic state prediction model using the associated features;
dividing the training sample data into a second test set and a second verification set according to a preset dividing proportion;
and training to obtain a final traffic state prediction model based on the second test set, the second verification set and the initial traffic state prediction model.
Further, the preset loss function is a square loss function;
the objective function of the traffic state prediction model is as follows:
Figure BDA0003517427680000021
wherein, ytThe actual traffic state value corresponding to the road section to be predicted in the t step,
Figure BDA0003517427680000022
is the predicted value obtained by the traffic state prediction model in the step t-1, ft(xt) As a transformation function, xtAs an attribute, Ω (f)i) For the regularization operation of the ith tree,
Figure BDA0003517427680000031
γ is the threshold for controlling node splitting, λ is the L2 regularization weight, ω is the score of the leaves, and M is the number of leaves.
Further, fusing the first data, the second data, and the third data to obtain training sample data, including:
completing the first data and the second data to obtain the completed first data and second data;
and fusing the supplemented first data and second data with the third data to obtain training sample data.
In a possible implementation manner of the first aspect, the completing the first data and the second data specifically includes:
dividing the characteristic into continuous variables or discrete variables according to the attribute correspondence of the characteristic;
sorting the features correspondingly according to the total amount of the missing data of each feature and the attribute;
if the data is a continuous variable, initializing the value of missing data by using the median of adjacent time periods or all time periods;
and/or, if the variable is a discrete variable, initializing the value of the missing data by using the mode of the adjacent time segment or all the time segments.
Further, if the variable is a continuous variable, initializing a value of missing data by using median of adjacent time periods or all time periods, specifically including:
respectively obtaining a first new data set after the value of the missing data of each continuous variable is initialized by a median each time;
calculating the difference between each new data set and the corresponding old data set, and summing to obtain a first sum value;
if the first sum is smaller than a preset difference value, stopping completing;
and/or, if the variable is a discrete variable, initializing the value of the missing data by using the mode of the adjacent time periods or all the time periods, specifically including:
respectively obtaining a second new data set after initializing the value of the missing data of each discrete variable by using a mode each time;
calculating the difference between each second new data set and the corresponding second old data set, and summing to obtain a second sum;
and if the second sum is smaller than the preset difference, stopping completing.
For example, calculating a difference between each of the first new data sets and the corresponding first old data set, and summing the differences to obtain a first sum, specifically includes:
the calculation is performed according to the following formula:
Figure BDA0003517427680000041
where Δ N is a first sum, j is the number of the ordered continuous variables, DnFor missing continuous variable values of the first new data set, DoMissing continuous variable values for the first old data set;
and/or calculating the difference between each new second data set and the corresponding old second data set, and summing the differences to obtain a second sum, specifically comprising:
the calculation is performed according to the following formula:
Figure BDA0003517427680000042
where Δ F is the second sum, j is the number of the ordered continuous variables, i is the number of the ordered discrete variables, xnFor missing discrete variable values, x, of the second new data setoFor missing discrete variable values of the second old data set, I is the decision function, if xn≠xoIf not, I is taken as 1, otherwise, I is taken as 0, NmisIs the total number of missing items in the discrete variable.
In a possible implementation manner of the first aspect, the obtaining the first data specifically includes:
acquiring traffic state data of historical time periods of the road section to be predicted according to a preset selected value, wherein the traffic state data of each time period comprises traffic data, corresponding spatial features and time features, and the first data comprises the traffic state data of all the time periods;
the traffic data is data collected by a sensor, the spatial characteristic is a traffic state index of the road section to be predicted, and the time characteristic is a time state of the road section to be predicted.
In a possible implementation manner of the first aspect, the obtaining the second data specifically includes:
acquiring traffic state data of historical time periods of all steering in each direction of the upstream and downstream intersections according to preset selected values, wherein the traffic state data of each time period comprises traffic data and corresponding spatial features, and the second data comprises the traffic state data of all time periods;
the traffic data is acquired by a sensor, and the spatial characteristics are traffic state indexes of the upstream and downstream intersections.
In a second aspect, an embodiment of the present application provides a traffic state prediction method, including:
acquiring first data, second data and third data, wherein the first data comprise historical traffic state data of a road section to be predicted, the second data comprise historical traffic state data of an intersection upstream and downstream of the road section to be predicted, the third data comprise characteristics of the first data and spatial characteristics of the second data, and the characteristics comprise spatial characteristics and/or temporal characteristics;
fusing the first data, the second data and the third data to obtain fused data;
obtaining a traffic state prediction result of the road section to be predicted according to the fusion data by using a traffic state prediction model;
wherein the traffic state prediction model is a final traffic state prediction model obtained by training according to the method of any one of the above first aspect.
In a third aspect, an embodiment of the present application provides an electronic device, including: comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor when executing the computer program implementing the method according to any of the first or second aspects described above.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, including: the computer-readable storage medium stores a computer program, wherein the computer program, when executed by a processor, implements a method as in any one of the above first aspects or the above second aspect.
In a fifth aspect, embodiments of the present application provide a computer program product, which, when run on an electronic device, causes the electronic device to perform the method of any one of the above first aspects or the above second aspect.
It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.
Compared with the prior art, the embodiment of the application has the advantages that:
according to the embodiment of the application, first data, second data and third data are obtained, the first data comprise historical traffic state data of a road section to be predicted, the second data comprise historical traffic state data of upstream and downstream intersections of the road section to be predicted, the third data comprise characteristics of the first data and characteristics of the second data, and the characteristics comprise spatial characteristics and/or temporal characteristics; fusing the first data, the second data and the third data to obtain training sample data; constructing a traffic state prediction model based on training sample data, screening out associated features, wherein the associated features are features with the importance degree greater than a preset importance degree, and adding consideration to features with high importance degrees; the final traffic state prediction model is established based on the associated characteristics, the actual road network structure does not need to be simplified and assumed, the prediction precision, the operation efficiency and the robustness can be improved, the prediction result is ensured to be consistent with the actual traffic condition, and the condition of the bidirectional traffic flow can be accurately predicted.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic flow chart of a method for constructing a traffic state prediction model according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a method for constructing a traffic state prediction model according to another embodiment of the present application;
FIG. 3 is a schematic flow chart of a method for constructing a traffic state prediction model according to another embodiment of the present application;
fig. 4 is a schematic flow chart of a traffic state prediction method according to another embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless otherwise specifically stated.
Fig. 1 is a schematic flow chart of a method for constructing a traffic state prediction model according to an embodiment of the present application. By way of example and not limitation, as shown in fig. 1, the method comprises:
s101: acquiring first data, second data and third data;
the first data comprise historical traffic state data of the road section to be predicted, the second data comprise historical traffic state data of an intersection on the upstream and the downstream of the road section to be predicted, and the third data comprise characteristics of the first data and characteristics of the second data, wherein the characteristics comprise spatial characteristics and/or temporal characteristics.
In a possible implementation manner, traffic state data of historical time periods of a road section to be predicted are obtained according to preset selected values, the traffic state data of each time period comprise the traffic data and corresponding spatial features and time features, and the first data comprise the traffic state data of all the time periods.
Specifically, the preset selection value is used for selecting the traffic state data of the corresponding time period as historical traffic state data in the past. For example, if the preset selection value is 13, the traffic state data of 13 time periods is selected as the historical traffic state data.
Wherein, the traffic data is data collected by a sensor. For example, the data collected by the sensor may include data collected by a camera device and a detector.
The spatial characteristics are traffic state indexes of the road sections to be predicted. Specifically, the spatial characteristics are short-term traffic state indexes for each time period. The short-term traffic state index is the traffic state of a corresponding time period which is selected forward according to the time step. Illustratively, the time step is 3, if the traffic state in the time period t is to be predicted, the traffic states in the time periods t-1, t-2 and t-3 are selected in the past to form 3 characteristics as short-term traffic state indexes of the road section to be predicted. The traffic state may be: link transit time, traffic flow density, or link average transit speed.
The time characteristic is a time state of the road section to be predicted. Specifically, the time characteristic is a long-term time status of each time period. The temporal state may include one or more of: month, week, hour, weekday/off-weekday, peak/peak-off period. For example, the temporal state may be described as february, weekday, or rush hour. Because the similar traffic states can be brought by the same or similar time characteristics, the time characteristic consideration is added, the nonlinear quantity can be learned by the prediction model, and the prediction accuracy is improved.
In a possible implementation manner, traffic state data of all turning historical time periods in each direction of the upstream and downstream intersections are obtained according to preset selected values, the traffic state data of each time period comprise traffic data and corresponding spatial features, and the second data comprise the traffic state data of all the time periods.
Specifically, the preset selection value is used for selecting the traffic state data of the corresponding time period as historical traffic state data in the prior art. The preset selected value may be the same as or different from the preset selected value. In this embodiment, the preset selected values are also 13.
Setting each intersection at the upstream and downstream as a node, and acquiring traffic state data in the east, west, south and north directions of the node, namely selecting the traffic state data of 13 time periods as historical traffic state data in the three turning directions of straight running, left turning and right turning in each direction. And if the direction is not the same or the steering direction is not the same, removing the corresponding preset data.
Wherein, the traffic data is data collected by a sensor. For example, the data collected by the sensors may include camera equipment, ring probes deployed on the ground of an intersection, and floating car GPS collected data. The traffic data may be fused data.
The spatial characteristics are traffic state indexes of the upstream and downstream intersections. Exemplary traffic status indicators include average queue length, number of stops, delay in stops, and average speed of passage. The average queuing length, the parking times and the parking delay can be extracted from data collected by the camera equipment through a convolutional neural network, and the average passing speed can be calculated from data collected by a floating car GPS and a corresponding speed measurement result.
In this embodiment, the original data sources corresponding to the first data, the second data, and the third data are stored in a classified manner, so as to provide a basis for preprocessing, extracting, and analyzing the corresponding data sources.
S102: and fusing the first data, the second data and the third data to obtain training sample data.
Optionally, the first data, the second data and the third data are corresponded and fused by time.
S103: constructing a traffic state prediction model based on training sample data, and screening out associated features;
the associated feature is a feature having a degree of importance greater than a preset degree of importance.
And constructing a traffic state prediction model by taking the characteristics in the training sample data as segmentation points based on the loss function. In the present embodiment, the traffic state prediction model is an XGBoost prediction model (eXtreme traffic Gradient Boosting). Wherein the preset importance is set according to the requirements of model precision and/or variable quantity.
By extracting the features which have the greatest contribution to the traffic state of the road section to be predicted from the features and performing feature learning, the efficiency and the accuracy of the model on the feature learning can be improved, the redundant features are eliminated, and the robustness of the model is improved.
S104: and establishing a final traffic state prediction model based on the associated characteristics.
And establishing a final traffic state prediction model, namely a final XGboost prediction model, by using the associated features as segmentation points and based on a loss function.
In the embodiment, first data, second data and third data are acquired, wherein the first data comprise historical traffic state data of a road section to be predicted, the second data comprise historical traffic state data of an intersection upstream and downstream of the road section to be predicted, and the third data comprise characteristics of the first data and characteristics of the second data, and the characteristics comprise spatial characteristics and/or temporal characteristics; fusing the first data, the second data and the third data to obtain training sample data; constructing a traffic state prediction model based on training sample data, screening out associated features, wherein the associated features are features with the importance degree greater than a preset importance degree, and adding consideration to features with high importance degrees; the final traffic state prediction model is established based on the associated characteristics, the actual road network structure does not need to be simplified and assumed, the prediction precision, the operation efficiency and the robustness can be improved, the prediction result is ensured to be consistent with the actual traffic condition, and the condition of the bidirectional traffic flow can be accurately predicted.
Fig. 2 is a schematic flow chart of a method for constructing a traffic state prediction model according to another embodiment of the present application. By way of example and not limitation, as shown in fig. 2, the method comprises:
s201: and dividing the training sample data into a first test set and a first verification set according to a preset dividing proportion.
For example, the preset division ratio is selected to be 4: 1, into 4 first test sets and 1 first verification set.
S202: and setting the characteristics corresponding to the first verification set as decision attributes, setting the characteristics corresponding to the first test set as condition attributes, and establishing and training a traffic state prediction model based on a preset loss function.
In one possible implementation, the predetermined loss function is a squared loss function;
the objective function of the traffic state prediction model is as follows:
Figure BDA0003517427680000101
wherein, ytThe actual traffic state value corresponding to the road section to be predicted in the t step,
Figure BDA0003517427680000102
for the predicted value obtained by the traffic state prediction model in step t-1, ft(xt) As a transformation function, xtIs an attribute. The transformation function may include: XGboost, random forest.
Ω(fi) For the regularization operation of the ith tree,
Figure BDA0003517427680000111
γ is the threshold for controlling node splitting, λ is the L2 regularization weight, ω is the score of the leaves, and M is the number of leaves.
The objective function of the traffic state prediction model is established based on the idea of minimizing a loss function (maximizing an objective function obj) by using a gradient descent algorithm and features. The model performance of the model was characterized using Mean Absolute Percent Error (MAPE).
And in the training process, optimizing the model parameters by adopting a GridSearchCV algorithm (a grid search method) to obtain an optimized parameter adjusting result. The parameters to be optimized include: maximum depth (max _ depth), learning rate (learning _ rate), regularization parameters (alpha, gamma, lambda), total number of trees (n _ estimators), and the like.
S203: in the training process, the importance of the features is calculated.
Specifically, a score of the feature to the prediction model improvement during each segmentation is obtained; a square weighting of the scores is calculated. The square weighting of the score is the importance of the corresponding feature.
The calculation is performed according to the following formula:
Figure BDA0003517427680000112
wherein S isiIs the importance of the ith feature, K is the number of segments when calculating the corresponding feature each time in the training process,
Figure BDA0003517427680000113
the score of the prediction model at the t-th segmentation of the ith feature,
Figure BDA0003517427680000114
and (4) predicting the model score for the ith characteristic at t-1 segmentation time.
S204: and if the importance is greater than the preset importance, selecting the importance as the associated feature.
For example, the preset importance is set to 10%, and if the importance of the feature is greater than 10%, the feature is selected as the associated feature.
In another embodiment, parameters such as total sample ratio (subsample) used for modeling, sample ratio (subsample _ byte) in each tree, and boost method (tree _ boost) can be set or optimized to obtain a more optimized call parameter result.
In another embodiment, the parameter optimization process and feature screening process can be accelerated by setting alpha and lambda to 0 in the regularization parameters, the total sample ratio for modeling to be 0.5, and the sample ratio within each tree to be 0.8.
In another embodiment, when the prediction model is required to have learning capability on the time features, the time features are not screened, the spatial features in all directions of the upstream and downstream intersections are screened, and the spatial features with the importance degree greater than the preset importance degree are screened out to serve as the associated spatial features.
In another embodiment, the feature importance indicators of the XGBoost prediction model may be used to screen features of training sample data. Specifically, the contribution of the features of the training sample data is quantitatively evaluated by using gain (gain), coverage (cover) or total gain (total _ gain) as evaluation indexes, and the features with the contribution larger than a preset index threshold are used as associated features.
S205: an initial traffic state prediction model is built using the associated features.
And establishing an initial traffic state prediction model by using the optimized parameter adjusting result and the associated characteristics.
S206: and dividing the training sample data into a second test set and a second verification set according to a preset dividing proportion.
For example, the preset division ratio is selected to be 4: 1, randomly divided into 4 second test sets and 1 second verification set. The training sample data of the first test set and the second test set are the same or different, and the training sample data of the first verification set and the second verification set are the same or different.
S207: and training to obtain a final traffic state prediction model based on the second test set, the second verification set and the initial traffic state prediction model.
In the training process, based on the second test set and the second verification set, the GridSearchCV algorithm is continuously adopted to optimize the parameters of the initial traffic state prediction model, and the parameters to be optimized comprise: maximum depth (max _ depth), learning rate (learning _ rate), regularization parameters (alpha, gamma, lambda), total number of subtrees (n _ estimators), and the like.
And then continuing to train the optimized model by using the training sample data, and performing model performance verification by adopting 10-fold cross inspection to obtain the associated characteristics and the final traffic state prediction model.
In another embodiment, parameters such as total sample ratio (subsample), sample ratio within each tree (colsample _ byte), and boost method (tree _ boost) used for modeling can be optimized by GridSearchCV algorithm to obtain a more optimized tuning parameter result.
In another embodiment, the parameter optimization process is accelerated by setting the total sample ratio used for modeling to 0.5 and the sample ratio within each tree to 0.8.
In another embodiment, the spatial features of the upstream and downstream intersections in each direction may be screened again, and the spatial features with the importance degree greater than the preset importance degree are screened out as the associated spatial features.
Fig. 3 is a schematic flow chart of a method for constructing a traffic state prediction model according to another embodiment of the present application. As an example and not by way of limitation, as shown in fig. 3, fusing the first data, the second data, and the third data to obtain training sample data includes:
s301: complementing the first data and the second data to obtain the complemented first data and second data, including:
specifically, the attribute correspondence is divided into a continuous variable or a discrete variable according to the characteristic.
For example, the average passing speed is a continuous variable and the number of stops is a discrete variable, and then classified according to the attributes of the features.
And correspondingly sorting the features according to the total amount and the attribute of the missing data of each feature.
Illustratively, all continuous variables are sorted and numbered according to the total amount of missing data, and all discrete variables are sorted and numbered according to the total amount of missing data.
If the variable is a continuous variable, the value of the missing data is initialized by the median of the adjacent time period or all the time periods.
The method comprises the steps of initializing a value of missing data by using a median of adjacent time periods, specifically initializing by using the median of the time period adjacent to the time period of the missing data.
And/or the presence of a gas in the gas,
if the variable is a discrete variable, the values of the missing data are initialized by the mode of the adjacent time segments or all the time segments.
The missing data value is initialized by the mode of the adjacent time segment, specifically, the mode of the time segment adjacent to the time segment of the missing data is initialized.
In another embodiment, if the variable is a continuous variable, initializing the value of the missing data by using the median of the adjacent time segments or all the time segments, specifically including:
after the missing data value of each continuous variable is initialized by the median each time, a first new data set is obtained respectively. Meanwhile, the first old data set of each continuous variable before the current initialization, namely the first new data set obtained by the last initialization, is also obtained.
For example, each time initialization, the value of the partially missing data in each continuous variable is initialized with a median.
After each initialization, calculating the difference between each first new data set and the corresponding first old data set, and summing to obtain a first sum;
the calculation is performed according to the following formula:
Figure BDA0003517427680000141
where Δ N is a first sum, j is the number of the ordered continuous variables, DnFor missing continuous variable values of the first new data set, DoThe missing continuous variable values for the first old data set.
And if the first sum is smaller than the preset difference, stopping completing. Then, the completed first new data set is used as a final data set for modeling.
And/or, if the variable is a discrete variable, initializing the value of the missing data by using the mode of the adjacent time period or all the time periods, specifically comprising:
and respectively obtaining a second new data set after initializing the value of the missing data of each discrete variable by using the mode each time. Meanwhile, a second old data set of each discrete variable before the current initialization, namely a second new data set obtained by the last initialization, is also obtained.
For example, each initialization, the values of the partially missing data in each discrete variable are initialized with a mode.
After each initialization, the difference between each second new data set and the corresponding second old data set is calculated and summed to obtain a second sum.
The calculation is performed according to the following formula:
Figure BDA0003517427680000142
where Δ F is the second sum, j is the number of the ordered continuous variables, i is the number of the ordered discrete variables, xnFor missing discrete variable values, x, of the second new data setoFor missing discrete variable values of the second old data set, I is the decision function, if xn≠xoThen I isGet 1, otherwise, I gets 0, NmisIs the total number of missing items in the discrete variable.
And if the second sum is smaller than the preset difference, stopping completing. And then, taking the complete second new data set as a final modeling data set.
The preset difference value can be selected according to actual conditions. Illustratively, the predetermined difference is 1%.
S302: and fusing the completed first data and second data with third data to obtain training sample data.
Fig. 4 is a schematic flow chart of a traffic state prediction method according to another embodiment of the present application. By way of example and not limitation, the method can be applied to a traffic command system, an emergency real-time command and dispatch system, an emergency aid decision-making system and a mobile phone map traffic state visualization system. As shown in fig. 4, the method includes:
s401: acquiring first data, second data and third data;
the first data comprise historical traffic state data of a road section to be predicted, the second data comprise historical traffic state data of an intersection upstream and downstream of the road section to be predicted, and the third data comprise characteristics of the first data and characteristics of the second data, wherein the characteristics comprise spatial characteristics and/or temporal characteristics.
S402: and fusing the first data, the second data and the third data to obtain fused data.
S403: and obtaining a traffic state prediction result of the road section to be predicted according to the fusion data by using the traffic state prediction model.
The traffic state prediction model is a final traffic state prediction model obtained by training through any one of the methods.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by functions and internal logic of the process, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 5, the electronic apparatus 5 of this embodiment includes: at least one processor 50 (only one shown in fig. 5), a memory 51, and a computer program 52 stored in the memory 51 and executable on the at least one processor 50, the steps of any of the various method embodiments described above being implemented when the computer program 52 is executed by the processor 50.
The electronic device 5 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The electronic device may include, but is not limited to, a processor 50, a memory 51. Those skilled in the art will appreciate that fig. 5 is merely an example of the electronic device 5, and does not constitute a limitation of the electronic device 5, and may include more or less components than those shown, or combine some of the components, or different components, such as an input-output device, a network access device, etc.
The Processor 50 may be a Central Processing Unit (CPU), and the Processor 50 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 51 may in some embodiments be an internal storage unit of the electronic device 5, such as a hard disk or a memory of the electronic device 5. The memory 51 may also be an external storage device of the electronic device 5 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the electronic device 5. The memory 51 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 51 may also be used to temporarily store data that has been output or is to be output.
It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the foregoing method embodiments.
The embodiments of the present application provide a computer program product, which when running on a mobile terminal, enables the mobile terminal to implement the steps in the above method embodiments when executed.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/electronic device, a recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other ways. For example, the above-described apparatus/network device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (13)

1. A traffic state prediction model construction method is characterized by comprising the following steps:
acquiring first data, second data and third data, wherein the first data comprise historical traffic state data of a road section to be predicted, the second data comprise historical traffic state data of an intersection upstream and downstream of the road section to be predicted, the third data comprise characteristics of the first data and characteristics of the second data, and the characteristics comprise spatial characteristics and/or temporal characteristics;
fusing the first data, the second data and the third data to obtain training sample data;
constructing a traffic state prediction model based on the training sample data, and screening out associated features, wherein the associated features are features with the importance degree greater than a preset importance degree;
and establishing a final traffic state prediction model based on the associated characteristics.
2. The method of claim 1, wherein constructing a traffic status prediction model based on the training sample data and screening out associated features comprises:
dividing the training sample data into a first test set and a first verification set according to a preset dividing proportion;
setting the characteristics corresponding to the first verification set as decision attributes, setting the characteristics corresponding to the first test set as condition attributes, and establishing and training the traffic state prediction model based on a preset loss function;
in the training process, calculating the importance of the features;
and if the importance is greater than the preset importance, selecting the importance as the associated feature.
3. The method of claim 2, wherein calculating the importance of the features comprises:
obtaining the score of the feature for improving the traffic state prediction model during each segmentation;
a square weighting of the score is calculated.
4. The method of claim 1, wherein building a final traffic state prediction model based on the associated features comprises:
establishing an initial traffic state prediction model using the associated features;
dividing the training sample data into a second test set and a second verification set according to a preset dividing proportion;
and training to obtain a final traffic state prediction model based on the second test set, the second verification set and the initial traffic state prediction model.
5. The method of claim 2, wherein the predetermined loss function is a squared loss function;
the objective function of the traffic state prediction model is as follows:
Figure FDA0003517427670000021
wherein, ytThe actual traffic state value corresponding to the road section to be predicted in the t step,
Figure FDA0003517427670000022
is a predicted value obtained by the traffic state prediction model at step t-1, ft(xt) As a transformation function, xtAs an attribute, Ω (f)i) For the regularization operation of the ith tree,
Figure FDA0003517427670000023
γ is the threshold for controlling node splitting, λ is the L2 regularization weight, ω is the score of the leaves, and M is the number of leaves.
6. The method of claim 1, wherein fusing the first data, the second data, and the third data to obtain training sample data comprises:
completing the first data and the second data to obtain the completed first data and second data;
and fusing the supplemented first data and second data with the third data to obtain training sample data.
7. The method of claim 6, wherein completing the first data and the second data comprises:
dividing the characteristic into continuous variables or discrete variables according to the attribute correspondence of the characteristic;
sorting the features correspondingly according to the total amount of missing data of each feature and the attribute;
if the data is a continuous variable, initializing the value of missing data by using the median of adjacent time periods or all time periods;
and/or, if the variable is a discrete variable, initializing the value of the missing data by using the mode of the adjacent time segment or all the time segments.
8. The method of claim 7, wherein if the variable is a continuous variable, initializing the value of the missing data with the median of the adjacent time segments or all time segments, specifically comprising:
respectively obtaining a first new data set after the value of the missing data of each continuous variable is initialized by a median each time;
calculating the difference between each new data set and the corresponding old data set, and summing to obtain a first sum value;
if the first sum is smaller than a preset difference value, stopping completing;
and/or, if the variable is a discrete variable, initializing the value of the missing data by using the mode of the adjacent time period or all the time periods, specifically comprising:
respectively obtaining a second new data set after initializing the value of the missing data of each discrete variable by using a mode each time;
calculating the difference between each second new data set and the corresponding second old data set, and summing to obtain a second sum;
and if the second sum is smaller than the preset difference, stopping completing.
9. The method of claim 8, wherein calculating differences between each of the first new data sets and the corresponding first old data set and summing the differences to obtain a first sum comprises:
the calculation is performed according to the following formula:
Figure FDA0003517427670000031
where Δ N is a first sum, j is the number of the ordered continuous variables, DnFor missing continuous variable values of the first new data set, DoMissing continuous variable values for the first old data set;
and/or calculating the difference between each new second data set and the corresponding old second data set, and summing the differences to obtain a second sum, specifically comprising:
the calculation is performed according to the following formula:
Figure FDA0003517427670000032
where Δ F is the second sum, j is the number of the ordered continuous variables, i is the number of the ordered discrete variables, xnFor missing discrete variable values, x, of the second new data setoFor missing discrete variable values of the second old data set, I is the decision function, if xn≠xoIf not, I is taken as 1, otherwise, I is taken as 0, NmisIs the total number of missing items in the discrete variable.
10. The method of claim 1, wherein obtaining the first data specifically comprises:
acquiring traffic state data of historical time periods of the road section to be predicted according to a preset selected value, wherein the traffic state data of each time period comprises traffic data, corresponding spatial features and time features, and the first data comprises the traffic state data of all the time periods;
the traffic data is data collected by a sensor, the spatial characteristic is a traffic state index of the road section to be predicted, and the time characteristic is a time state of the road section to be predicted.
11. The method of claim 1, wherein obtaining second data specifically comprises:
acquiring traffic state data of historical time periods of all steering in each direction of the upstream and downstream intersections according to preset selected values, wherein the traffic state data of each time period comprises traffic data and corresponding spatial features, and the second data comprises the traffic state data of all time periods;
the traffic data is acquired by a sensor, and the spatial characteristics are traffic state indexes of the upstream and downstream intersections.
12. A traffic state prediction method, comprising:
acquiring first data, second data and third data, wherein the first data comprise historical traffic state data of a road section to be predicted, the second data comprise historical traffic state data of an upstream intersection and a downstream intersection of the road section to be predicted, the third data comprise characteristics of the first data and characteristics of the second data, and the characteristics comprise spatial characteristics and/or temporal characteristics;
fusing the first data, the second data and the third data to obtain fused data;
obtaining a traffic state prediction result of the road section to be predicted according to the fusion data by using a traffic state prediction model;
wherein the traffic state prediction model is a final traffic state prediction model trained by the method of any one of claims 1-11.
13. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 11 or the method of claim 12.
CN202210170462.2A 2021-11-12 2022-02-23 Traffic state prediction model construction method and traffic state prediction method Active CN114596702B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2021113423540 2021-11-12
CN202111342354 2021-11-12

Publications (2)

Publication Number Publication Date
CN114596702A true CN114596702A (en) 2022-06-07
CN114596702B CN114596702B (en) 2023-07-04

Family

ID=81804490

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210170462.2A Active CN114596702B (en) 2021-11-12 2022-02-23 Traffic state prediction model construction method and traffic state prediction method

Country Status (1)

Country Link
CN (1) CN114596702B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024082533A1 (en) * 2022-10-17 2024-04-25 京东城市(北京)数字科技有限公司 Training method and apparatus for spatio-temporal data processing model, spatio-temporal data processing method and apparatus, and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100063715A1 (en) * 2007-01-24 2010-03-11 International Business Machines Corporation Method and structure for vehicular traffic prediction with link interactions and missing real-time data
CN109300310A (en) * 2018-11-26 2019-02-01 平安科技(深圳)有限公司 A kind of vehicle flowrate prediction technique and device
CN110826774A (en) * 2019-10-18 2020-02-21 广州供电局有限公司 Bus load prediction method and device, computer equipment and storage medium
CN110853347A (en) * 2019-10-14 2020-02-28 深圳市综合交通运行指挥中心 Short-time traffic road condition prediction method and device and terminal equipment
CN111738474A (en) * 2019-03-25 2020-10-02 京东数字科技控股有限公司 Traffic state prediction method and device
CN113096404A (en) * 2021-04-23 2021-07-09 中南大学 Road blockade oriented quantitative calculation method for change of traffic flow of road network
CN113379156A (en) * 2021-06-30 2021-09-10 南方科技大学 Speed prediction method, device, equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100063715A1 (en) * 2007-01-24 2010-03-11 International Business Machines Corporation Method and structure for vehicular traffic prediction with link interactions and missing real-time data
CN109300310A (en) * 2018-11-26 2019-02-01 平安科技(深圳)有限公司 A kind of vehicle flowrate prediction technique and device
CN111738474A (en) * 2019-03-25 2020-10-02 京东数字科技控股有限公司 Traffic state prediction method and device
CN110853347A (en) * 2019-10-14 2020-02-28 深圳市综合交通运行指挥中心 Short-time traffic road condition prediction method and device and terminal equipment
CN110826774A (en) * 2019-10-18 2020-02-21 广州供电局有限公司 Bus load prediction method and device, computer equipment and storage medium
CN113096404A (en) * 2021-04-23 2021-07-09 中南大学 Road blockade oriented quantitative calculation method for change of traffic flow of road network
CN113379156A (en) * 2021-06-30 2021-09-10 南方科技大学 Speed prediction method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
钟颖;邵毅明;吴文文;胡广雪;: "基于XGBoost的短时交通流预测模型", 科学技术与工程, no. 30 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024082533A1 (en) * 2022-10-17 2024-04-25 京东城市(北京)数字科技有限公司 Training method and apparatus for spatio-temporal data processing model, spatio-temporal data processing method and apparatus, and medium

Also Published As

Publication number Publication date
CN114596702B (en) 2023-07-04

Similar Documents

Publication Publication Date Title
CN110210604B (en) Method and device for predicting movement track of terminal equipment
CN108921200B (en) Method, apparatus, device and medium for classifying driving scene data
CN112700072B (en) Traffic condition prediction method, electronic device, and storage medium
CN109087510B (en) Traffic monitoring method and device
CN112015843B (en) Driving risk situation assessment method and system based on multi-vehicle intention interaction result
KR20200115063A (en) Method of determining quality of map trajectory matching data, device, server and medium
CN112419710B (en) Traffic congestion data prediction method, traffic congestion data prediction device, computer equipment and storage medium
CN110751828B (en) Road congestion measuring method and device, computer equipment and storage medium
CN110991311A (en) Target detection method based on dense connection deep network
CN111680362A (en) Method, device and equipment for acquiring automatic driving simulation scene and storage medium
CN110264270B (en) Behavior prediction method, behavior prediction device, behavior prediction equipment and storage medium
CN111368887B (en) Training method of thunderstorm weather prediction model and thunderstorm weather prediction method
CN112906823B (en) Target object recognition model training method, recognition method and recognition device
CN111815098A (en) Traffic information processing method and device based on extreme weather, storage medium and electronic equipment
CN112686466A (en) Subway passenger path confirmation method and device
CN114360239A (en) Traffic prediction method and system for multilayer space-time traffic knowledge map reconstruction
CN110021161B (en) Traffic flow direction prediction method and system
CN113159403A (en) Method and device for predicting pedestrian track at intersection
CN114596709B (en) Data processing method, device, equipment and storage medium
CN114596702A (en) Traffic state prediction model construction method and traffic state prediction method
CN110134754B (en) Method, device, server and medium for predicting operation duration of region interest point
CN115392548A (en) Travel demand prediction method, device and storage medium for travel site
CN113159457A (en) Intelligent path planning method and system and electronic equipment
CN109934496B (en) Method, device, equipment and medium for determining inter-area traffic influence
Shin et al. Statistical evaluation of different sample sizes for local calibration process in the highway safety manual

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant