CN114596702A

CN114596702A - Traffic state prediction model construction method and traffic state prediction method

Info

Publication number: CN114596702A
Application number: CN202210170462.2A
Authority: CN
Inventors: 杨丽丽; 孟繁宇; 曾益萍; 袁狄平; 王倩倩
Original assignee: Southern University of Science and Technology
Current assignee: Southern University of Science and Technology
Priority date: 2021-11-12
Filing date: 2022-02-23
Publication date: 2022-06-07
Anticipated expiration: 2042-02-23
Also published as: CN114596702B

Abstract

The application is applicable to the technical field of intelligent traffic, and provides a traffic state prediction model construction method and a traffic state prediction method. The model construction method comprises the following steps: acquiring first data, second data and third data, wherein the first data comprise historical traffic state data of a road section to be predicted, the second data comprise historical traffic state data of an upstream intersection and a downstream intersection of the road section to be predicted, the third data comprise characteristics of the first data and characteristics of the second data, and the characteristics comprise spatial characteristics and/or temporal characteristics; fusing the first data, the second data and the third data to obtain training sample data; constructing a traffic state prediction model based on training sample data, and screening out associated features, wherein the associated features are features with the importance degree greater than a preset importance degree; and a final traffic state prediction model is established based on the associated characteristics, so that the prediction precision can be improved, the prediction result is ensured to be consistent with the actual traffic condition, and the condition of the bidirectional traffic flow can be accurately predicted.

Description

Traffic state prediction model construction method and traffic state prediction method

Technical Field

The application belongs to the technical field of intelligent traffic, and particularly relates to a traffic state prediction model construction method and a traffic state prediction method.

Background

At present, in modern traffic management and emergency resource scheduling, the latest road traffic running state information needs to be mastered, so that the whole network traffic state can be known globally, and a decision maker is helped to specify schemes such as traffic jam dispersion, accident disposal, rescue path planning and the like. Generally, the acquisition and visualization of the road network operation state needs to be based on accurate prediction of the traffic states of the road sections and intersections, including the speeds, the flow rates, the passing time and the like of the road sections and the intersections

However, capturing of the spatial correlation of traffic information by the conventional traffic state prediction method generally stays in a correlation matrix estimated or learned through a road network topological structure and historical traffic data, so that a large amount of simplification and strong assumptions need to be made on an actual road network structure, the consideration of actual traffic conditions and secondary/unknown factors is lacked, the prediction precision of the road network operation state can be reduced, and the scheme specified by a decision maker is influenced.

Disclosure of Invention

The embodiment of the application provides a traffic state prediction model construction method and a traffic state prediction method, and the problem of low prediction precision can be solved.

In a first aspect, an embodiment of the present application provides a method for constructing a traffic state prediction model, including:

acquiring first data, second data and third data, wherein the first data comprise historical traffic state data of a road section to be predicted, the second data comprise historical traffic state data of an intersection upstream and downstream of the road section to be predicted, the third data comprise characteristics of the first data and characteristics of the second data, and the characteristics comprise spatial characteristics and/or temporal characteristics;

fusing the first data, the second data and the third data to obtain training sample data;

constructing a traffic state prediction model based on the training sample data, and screening out associated features, wherein the associated features are features with the importance degree greater than a preset importance degree;

and establishing a final traffic state prediction model based on the associated characteristics.

In a possible implementation manner of the first aspect, training a prediction model and screening out associated features specifically includes:

dividing the training sample data into a first test set and a first verification set according to a preset dividing proportion;

setting the characteristics corresponding to the first verification set as decision attributes, setting the characteristics corresponding to the first test set as condition attributes, and establishing and training the traffic state prediction model based on a preset loss function;

in the training process, calculating the importance of the features;

and if the importance is greater than the preset importance, selecting the importance as the associated feature.

Further, calculating the importance of the features specifically includes:

acquiring the score of the feature for improving the traffic state prediction model during each segmentation;

a square weighting of the score is calculated.

In a possible implementation manner of the first aspect, establishing a final traffic state prediction model based on the associated features specifically includes:

establishing an initial traffic state prediction model using the associated features;

dividing the training sample data into a second test set and a second verification set according to a preset dividing proportion;

and training to obtain a final traffic state prediction model based on the second test set, the second verification set and the initial traffic state prediction model.

Further, the preset loss function is a square loss function;

the objective function of the traffic state prediction model is as follows:

wherein, y_tThe actual traffic state value corresponding to the road section to be predicted in the t step,

is the predicted value obtained by the traffic state prediction model in the step t-1, f_t(x_t) As a transformation function, x_tAs an attribute, Ω (f)_i) For the regularization operation of the ith tree,

γ is the threshold for controlling node splitting, λ is the L2 regularization weight, ω is the score of the leaves, and M is the number of leaves.

Further, fusing the first data, the second data, and the third data to obtain training sample data, including:

completing the first data and the second data to obtain the completed first data and second data;

and fusing the supplemented first data and second data with the third data to obtain training sample data.

In a possible implementation manner of the first aspect, the completing the first data and the second data specifically includes:

dividing the characteristic into continuous variables or discrete variables according to the attribute correspondence of the characteristic;

sorting the features correspondingly according to the total amount of the missing data of each feature and the attribute;

if the data is a continuous variable, initializing the value of missing data by using the median of adjacent time periods or all time periods;

and/or, if the variable is a discrete variable, initializing the value of the missing data by using the mode of the adjacent time segment or all the time segments.

Further, if the variable is a continuous variable, initializing a value of missing data by using median of adjacent time periods or all time periods, specifically including:

respectively obtaining a first new data set after the value of the missing data of each continuous variable is initialized by a median each time;

calculating the difference between each new data set and the corresponding old data set, and summing to obtain a first sum value;

if the first sum is smaller than a preset difference value, stopping completing;

and/or, if the variable is a discrete variable, initializing the value of the missing data by using the mode of the adjacent time periods or all the time periods, specifically including:

respectively obtaining a second new data set after initializing the value of the missing data of each discrete variable by using a mode each time;

calculating the difference between each second new data set and the corresponding second old data set, and summing to obtain a second sum;

and if the second sum is smaller than the preset difference, stopping completing.

For example, calculating a difference between each of the first new data sets and the corresponding first old data set, and summing the differences to obtain a first sum, specifically includes:

the calculation is performed according to the following formula:

where Δ N is a first sum, j is the number of the ordered continuous variables, D_nFor missing continuous variable values of the first new data set, D_oMissing continuous variable values for the first old data set;

and/or calculating the difference between each new second data set and the corresponding old second data set, and summing the differences to obtain a second sum, specifically comprising:

the calculation is performed according to the following formula:

where Δ F is the second sum, j is the number of the ordered continuous variables, i is the number of the ordered discrete variables, x_nFor missing discrete variable values, x, of the second new data set_oFor missing discrete variable values of the second old data set, I is the decision function, if x_n≠x_oIf not, I is taken as 1, otherwise, I is taken as 0, N_misIs the total number of missing items in the discrete variable.

In a possible implementation manner of the first aspect, the obtaining the first data specifically includes:

acquiring traffic state data of historical time periods of the road section to be predicted according to a preset selected value, wherein the traffic state data of each time period comprises traffic data, corresponding spatial features and time features, and the first data comprises the traffic state data of all the time periods;

the traffic data is data collected by a sensor, the spatial characteristic is a traffic state index of the road section to be predicted, and the time characteristic is a time state of the road section to be predicted.

In a possible implementation manner of the first aspect, the obtaining the second data specifically includes:

acquiring traffic state data of historical time periods of all steering in each direction of the upstream and downstream intersections according to preset selected values, wherein the traffic state data of each time period comprises traffic data and corresponding spatial features, and the second data comprises the traffic state data of all time periods;

the traffic data is acquired by a sensor, and the spatial characteristics are traffic state indexes of the upstream and downstream intersections.

In a second aspect, an embodiment of the present application provides a traffic state prediction method, including:

acquiring first data, second data and third data, wherein the first data comprise historical traffic state data of a road section to be predicted, the second data comprise historical traffic state data of an intersection upstream and downstream of the road section to be predicted, the third data comprise characteristics of the first data and spatial characteristics of the second data, and the characteristics comprise spatial characteristics and/or temporal characteristics;

fusing the first data, the second data and the third data to obtain fused data;

obtaining a traffic state prediction result of the road section to be predicted according to the fusion data by using a traffic state prediction model;

wherein the traffic state prediction model is a final traffic state prediction model obtained by training according to the method of any one of the above first aspect.

In a third aspect, an embodiment of the present application provides an electronic device, including: comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor when executing the computer program implementing the method according to any of the first or second aspects described above.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, including: the computer-readable storage medium stores a computer program, wherein the computer program, when executed by a processor, implements a method as in any one of the above first aspects or the above second aspect.

In a fifth aspect, embodiments of the present application provide a computer program product, which, when run on an electronic device, causes the electronic device to perform the method of any one of the above first aspects or the above second aspect.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

Compared with the prior art, the embodiment of the application has the advantages that:

according to the embodiment of the application, first data, second data and third data are obtained, the first data comprise historical traffic state data of a road section to be predicted, the second data comprise historical traffic state data of upstream and downstream intersections of the road section to be predicted, the third data comprise characteristics of the first data and characteristics of the second data, and the characteristics comprise spatial characteristics and/or temporal characteristics; fusing the first data, the second data and the third data to obtain training sample data; constructing a traffic state prediction model based on training sample data, screening out associated features, wherein the associated features are features with the importance degree greater than a preset importance degree, and adding consideration to features with high importance degrees; the final traffic state prediction model is established based on the associated characteristics, the actual road network structure does not need to be simplified and assumed, the prediction precision, the operation efficiency and the robustness can be improved, the prediction result is ensured to be consistent with the actual traffic condition, and the condition of the bidirectional traffic flow can be accurately predicted.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flow chart of a method for constructing a traffic state prediction model according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a method for constructing a traffic state prediction model according to another embodiment of the present application;

FIG. 3 is a schematic flow chart of a method for constructing a traffic state prediction model according to another embodiment of the present application;

fig. 4 is a schematic flow chart of a traffic state prediction method according to another embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless otherwise specifically stated.

Fig. 1 is a schematic flow chart of a method for constructing a traffic state prediction model according to an embodiment of the present application. By way of example and not limitation, as shown in fig. 1, the method comprises:

s101: acquiring first data, second data and third data;

the first data comprise historical traffic state data of the road section to be predicted, the second data comprise historical traffic state data of an intersection on the upstream and the downstream of the road section to be predicted, and the third data comprise characteristics of the first data and characteristics of the second data, wherein the characteristics comprise spatial characteristics and/or temporal characteristics.

In a possible implementation manner, traffic state data of historical time periods of a road section to be predicted are obtained according to preset selected values, the traffic state data of each time period comprise the traffic data and corresponding spatial features and time features, and the first data comprise the traffic state data of all the time periods.

Specifically, the preset selection value is used for selecting the traffic state data of the corresponding time period as historical traffic state data in the past. For example, if the preset selection value is 13, the traffic state data of 13 time periods is selected as the historical traffic state data.

Wherein, the traffic data is data collected by a sensor. For example, the data collected by the sensor may include data collected by a camera device and a detector.

The spatial characteristics are traffic state indexes of the road sections to be predicted. Specifically, the spatial characteristics are short-term traffic state indexes for each time period. The short-term traffic state index is the traffic state of a corresponding time period which is selected forward according to the time step. Illustratively, the time step is 3, if the traffic state in the time period t is to be predicted, the traffic states in the time periods t-1, t-2 and t-3 are selected in the past to form 3 characteristics as short-term traffic state indexes of the road section to be predicted. The traffic state may be: link transit time, traffic flow density, or link average transit speed.

The time characteristic is a time state of the road section to be predicted. Specifically, the time characteristic is a long-term time status of each time period. The temporal state may include one or more of: month, week, hour, weekday/off-weekday, peak/peak-off period. For example, the temporal state may be described as february, weekday, or rush hour. Because the similar traffic states can be brought by the same or similar time characteristics, the time characteristic consideration is added, the nonlinear quantity can be learned by the prediction model, and the prediction accuracy is improved.

In a possible implementation manner, traffic state data of all turning historical time periods in each direction of the upstream and downstream intersections are obtained according to preset selected values, the traffic state data of each time period comprise traffic data and corresponding spatial features, and the second data comprise the traffic state data of all the time periods.

Specifically, the preset selection value is used for selecting the traffic state data of the corresponding time period as historical traffic state data in the prior art. The preset selected value may be the same as or different from the preset selected value. In this embodiment, the preset selected values are also 13.

Setting each intersection at the upstream and downstream as a node, and acquiring traffic state data in the east, west, south and north directions of the node, namely selecting the traffic state data of 13 time periods as historical traffic state data in the three turning directions of straight running, left turning and right turning in each direction. And if the direction is not the same or the steering direction is not the same, removing the corresponding preset data.

Wherein, the traffic data is data collected by a sensor. For example, the data collected by the sensors may include camera equipment, ring probes deployed on the ground of an intersection, and floating car GPS collected data. The traffic data may be fused data.

The spatial characteristics are traffic state indexes of the upstream and downstream intersections. Exemplary traffic status indicators include average queue length, number of stops, delay in stops, and average speed of passage. The average queuing length, the parking times and the parking delay can be extracted from data collected by the camera equipment through a convolutional neural network, and the average passing speed can be calculated from data collected by a floating car GPS and a corresponding speed measurement result.

In this embodiment, the original data sources corresponding to the first data, the second data, and the third data are stored in a classified manner, so as to provide a basis for preprocessing, extracting, and analyzing the corresponding data sources.

S102: and fusing the first data, the second data and the third data to obtain training sample data.

Optionally, the first data, the second data and the third data are corresponded and fused by time.

S103: constructing a traffic state prediction model based on training sample data, and screening out associated features;

the associated feature is a feature having a degree of importance greater than a preset degree of importance.

And constructing a traffic state prediction model by taking the characteristics in the training sample data as segmentation points based on the loss function. In the present embodiment, the traffic state prediction model is an XGBoost prediction model (eXtreme traffic Gradient Boosting). Wherein the preset importance is set according to the requirements of model precision and/or variable quantity.

By extracting the features which have the greatest contribution to the traffic state of the road section to be predicted from the features and performing feature learning, the efficiency and the accuracy of the model on the feature learning can be improved, the redundant features are eliminated, and the robustness of the model is improved.

S104: and establishing a final traffic state prediction model based on the associated characteristics.

And establishing a final traffic state prediction model, namely a final XGboost prediction model, by using the associated features as segmentation points and based on a loss function.

In the embodiment, first data, second data and third data are acquired, wherein the first data comprise historical traffic state data of a road section to be predicted, the second data comprise historical traffic state data of an intersection upstream and downstream of the road section to be predicted, and the third data comprise characteristics of the first data and characteristics of the second data, and the characteristics comprise spatial characteristics and/or temporal characteristics; fusing the first data, the second data and the third data to obtain training sample data; constructing a traffic state prediction model based on training sample data, screening out associated features, wherein the associated features are features with the importance degree greater than a preset importance degree, and adding consideration to features with high importance degrees; the final traffic state prediction model is established based on the associated characteristics, the actual road network structure does not need to be simplified and assumed, the prediction precision, the operation efficiency and the robustness can be improved, the prediction result is ensured to be consistent with the actual traffic condition, and the condition of the bidirectional traffic flow can be accurately predicted.

Fig. 2 is a schematic flow chart of a method for constructing a traffic state prediction model according to another embodiment of the present application. By way of example and not limitation, as shown in fig. 2, the method comprises:

s201: and dividing the training sample data into a first test set and a first verification set according to a preset dividing proportion.

For example, the preset division ratio is selected to be 4: 1, into 4 first test sets and 1 first verification set.

S202: and setting the characteristics corresponding to the first verification set as decision attributes, setting the characteristics corresponding to the first test set as condition attributes, and establishing and training a traffic state prediction model based on a preset loss function.

In one possible implementation, the predetermined loss function is a squared loss function;

the objective function of the traffic state prediction model is as follows:

for the predicted value obtained by the traffic state prediction model in step t-1, f_t(x_t) As a transformation function, x_tIs an attribute. The transformation function may include: XGboost, random forest.

Ω(f_i) For the regularization operation of the ith tree,

The objective function of the traffic state prediction model is established based on the idea of minimizing a loss function (maximizing an objective function obj) by using a gradient descent algorithm and features. The model performance of the model was characterized using Mean Absolute Percent Error (MAPE).

And in the training process, optimizing the model parameters by adopting a GridSearchCV algorithm (a grid search method) to obtain an optimized parameter adjusting result. The parameters to be optimized include: maximum depth (max _ depth), learning rate (learning _ rate), regularization parameters (alpha, gamma, lambda), total number of trees (n _ estimators), and the like.

S203: in the training process, the importance of the features is calculated.

Specifically, a score of the feature to the prediction model improvement during each segmentation is obtained; a square weighting of the scores is calculated. The square weighting of the score is the importance of the corresponding feature.

The calculation is performed according to the following formula:

wherein S is_iIs the importance of the ith feature, K is the number of segments when calculating the corresponding feature each time in the training process,

the score of the prediction model at the t-th segmentation of the ith feature,

and (4) predicting the model score for the ith characteristic at t-1 segmentation time.

S204: and if the importance is greater than the preset importance, selecting the importance as the associated feature.

For example, the preset importance is set to 10%, and if the importance of the feature is greater than 10%, the feature is selected as the associated feature.

In another embodiment, parameters such as total sample ratio (subsample) used for modeling, sample ratio (subsample _ byte) in each tree, and boost method (tree _ boost) can be set or optimized to obtain a more optimized call parameter result.

In another embodiment, the parameter optimization process and feature screening process can be accelerated by setting alpha and lambda to 0 in the regularization parameters, the total sample ratio for modeling to be 0.5, and the sample ratio within each tree to be 0.8.

In another embodiment, when the prediction model is required to have learning capability on the time features, the time features are not screened, the spatial features in all directions of the upstream and downstream intersections are screened, and the spatial features with the importance degree greater than the preset importance degree are screened out to serve as the associated spatial features.

In another embodiment, the feature importance indicators of the XGBoost prediction model may be used to screen features of training sample data. Specifically, the contribution of the features of the training sample data is quantitatively evaluated by using gain (gain), coverage (cover) or total gain (total _ gain) as evaluation indexes, and the features with the contribution larger than a preset index threshold are used as associated features.

S205: an initial traffic state prediction model is built using the associated features.

And establishing an initial traffic state prediction model by using the optimized parameter adjusting result and the associated characteristics.

S206: and dividing the training sample data into a second test set and a second verification set according to a preset dividing proportion.

For example, the preset division ratio is selected to be 4: 1, randomly divided into 4 second test sets and 1 second verification set. The training sample data of the first test set and the second test set are the same or different, and the training sample data of the first verification set and the second verification set are the same or different.

S207: and training to obtain a final traffic state prediction model based on the second test set, the second verification set and the initial traffic state prediction model.

In the training process, based on the second test set and the second verification set, the GridSearchCV algorithm is continuously adopted to optimize the parameters of the initial traffic state prediction model, and the parameters to be optimized comprise: maximum depth (max _ depth), learning rate (learning _ rate), regularization parameters (alpha, gamma, lambda), total number of subtrees (n _ estimators), and the like.

And then continuing to train the optimized model by using the training sample data, and performing model performance verification by adopting 10-fold cross inspection to obtain the associated characteristics and the final traffic state prediction model.

In another embodiment, parameters such as total sample ratio (subsample), sample ratio within each tree (colsample _ byte), and boost method (tree _ boost) used for modeling can be optimized by GridSearchCV algorithm to obtain a more optimized tuning parameter result.

In another embodiment, the parameter optimization process is accelerated by setting the total sample ratio used for modeling to 0.5 and the sample ratio within each tree to 0.8.

In another embodiment, the spatial features of the upstream and downstream intersections in each direction may be screened again, and the spatial features with the importance degree greater than the preset importance degree are screened out as the associated spatial features.

Fig. 3 is a schematic flow chart of a method for constructing a traffic state prediction model according to another embodiment of the present application. As an example and not by way of limitation, as shown in fig. 3, fusing the first data, the second data, and the third data to obtain training sample data includes:

s301: complementing the first data and the second data to obtain the complemented first data and second data, including:

specifically, the attribute correspondence is divided into a continuous variable or a discrete variable according to the characteristic.

For example, the average passing speed is a continuous variable and the number of stops is a discrete variable, and then classified according to the attributes of the features.

And correspondingly sorting the features according to the total amount and the attribute of the missing data of each feature.

Illustratively, all continuous variables are sorted and numbered according to the total amount of missing data, and all discrete variables are sorted and numbered according to the total amount of missing data.

If the variable is a continuous variable, the value of the missing data is initialized by the median of the adjacent time period or all the time periods.

The method comprises the steps of initializing a value of missing data by using a median of adjacent time periods, specifically initializing by using the median of the time period adjacent to the time period of the missing data.

And/or the presence of a gas in the gas,

if the variable is a discrete variable, the values of the missing data are initialized by the mode of the adjacent time segments or all the time segments.

The missing data value is initialized by the mode of the adjacent time segment, specifically, the mode of the time segment adjacent to the time segment of the missing data is initialized.

In another embodiment, if the variable is a continuous variable, initializing the value of the missing data by using the median of the adjacent time segments or all the time segments, specifically including:

after the missing data value of each continuous variable is initialized by the median each time, a first new data set is obtained respectively. Meanwhile, the first old data set of each continuous variable before the current initialization, namely the first new data set obtained by the last initialization, is also obtained.

For example, each time initialization, the value of the partially missing data in each continuous variable is initialized with a median.

After each initialization, calculating the difference between each first new data set and the corresponding first old data set, and summing to obtain a first sum;

the calculation is performed according to the following formula:

where Δ N is a first sum, j is the number of the ordered continuous variables, D_nFor missing continuous variable values of the first new data set, D_oThe missing continuous variable values for the first old data set.

And if the first sum is smaller than the preset difference, stopping completing. Then, the completed first new data set is used as a final data set for modeling.

And/or, if the variable is a discrete variable, initializing the value of the missing data by using the mode of the adjacent time period or all the time periods, specifically comprising:

and respectively obtaining a second new data set after initializing the value of the missing data of each discrete variable by using the mode each time. Meanwhile, a second old data set of each discrete variable before the current initialization, namely a second new data set obtained by the last initialization, is also obtained.

For example, each initialization, the values of the partially missing data in each discrete variable are initialized with a mode.

After each initialization, the difference between each second new data set and the corresponding second old data set is calculated and summed to obtain a second sum.

The calculation is performed according to the following formula:

where Δ F is the second sum, j is the number of the ordered continuous variables, i is the number of the ordered discrete variables, x_nFor missing discrete variable values, x, of the second new data set_oFor missing discrete variable values of the second old data set, I is the decision function, if x_n≠x_oThen I isGet 1, otherwise, I gets 0, N_misIs the total number of missing items in the discrete variable.

And if the second sum is smaller than the preset difference, stopping completing. And then, taking the complete second new data set as a final modeling data set.

The preset difference value can be selected according to actual conditions. Illustratively, the predetermined difference is 1%.

S302: and fusing the completed first data and second data with third data to obtain training sample data.

Fig. 4 is a schematic flow chart of a traffic state prediction method according to another embodiment of the present application. By way of example and not limitation, the method can be applied to a traffic command system, an emergency real-time command and dispatch system, an emergency aid decision-making system and a mobile phone map traffic state visualization system. As shown in fig. 4, the method includes:

s401: acquiring first data, second data and third data;

the first data comprise historical traffic state data of a road section to be predicted, the second data comprise historical traffic state data of an intersection upstream and downstream of the road section to be predicted, and the third data comprise characteristics of the first data and characteristics of the second data, wherein the characteristics comprise spatial characteristics and/or temporal characteristics.

S402: and fusing the first data, the second data and the third data to obtain fused data.

S403: and obtaining a traffic state prediction result of the road section to be predicted according to the fusion data by using the traffic state prediction model.

The traffic state prediction model is a final traffic state prediction model obtained by training through any one of the methods.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by functions and internal logic of the process, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 5, the electronic apparatus 5 of this embodiment includes: at least one processor 50 (only one shown in fig. 5), a memory 51, and a computer program 52 stored in the memory 51 and executable on the at least one processor 50, the steps of any of the various method embodiments described above being implemented when the computer program 52 is executed by the processor 50.

The electronic device 5 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The electronic device may include, but is not limited to, a processor 50, a memory 51. Those skilled in the art will appreciate that fig. 5 is merely an example of the electronic device 5, and does not constitute a limitation of the electronic device 5, and may include more or less components than those shown, or combine some of the components, or different components, such as an input-output device, a network access device, etc.

The Processor 50 may be a Central Processing Unit (CPU), and the Processor 50 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 51 may in some embodiments be an internal storage unit of the electronic device 5, such as a hard disk or a memory of the electronic device 5. The memory 51 may also be an external storage device of the electronic device 5 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the electronic device 5. The memory 51 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 51 may also be used to temporarily store data that has been output or is to be output.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the foregoing method embodiments.

The embodiments of the present application provide a computer program product, which when running on a mobile terminal, enables the mobile terminal to implement the steps in the above method embodiments when executed.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/electronic device, a recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other ways. For example, the above-described apparatus/network device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A traffic state prediction model construction method is characterized by comprising the following steps:

2. The method of claim 1, wherein constructing a traffic status prediction model based on the training sample data and screening out associated features comprises:

in the training process, calculating the importance of the features;

3. The method of claim 2, wherein calculating the importance of the features comprises:

obtaining the score of the feature for improving the traffic state prediction model during each segmentation;

a square weighting of the score is calculated.

4. The method of claim 1, wherein building a final traffic state prediction model based on the associated features comprises:

5. The method of claim 2, wherein the predetermined loss function is a squared loss function;

the objective function of the traffic state prediction model is as follows:

is a predicted value obtained by the traffic state prediction model at step t-1, f_t(x_t) As a transformation function, x_tAs an attribute, Ω (f)_i) For the regularization operation of the ith tree,

6. The method of claim 1, wherein fusing the first data, the second data, and the third data to obtain training sample data comprises:

7. The method of claim 6, wherein completing the first data and the second data comprises:

sorting the features correspondingly according to the total amount of missing data of each feature and the attribute;

8. The method of claim 7, wherein if the variable is a continuous variable, initializing the value of the missing data with the median of the adjacent time segments or all time segments, specifically comprising:

9. The method of claim 8, wherein calculating differences between each of the first new data sets and the corresponding first old data set and summing the differences to obtain a first sum comprises:

the calculation is performed according to the following formula:

the calculation is performed according to the following formula:

10. The method of claim 1, wherein obtaining the first data specifically comprises:

11. The method of claim 1, wherein obtaining second data specifically comprises:

12. A traffic state prediction method, comprising:

acquiring first data, second data and third data, wherein the first data comprise historical traffic state data of a road section to be predicted, the second data comprise historical traffic state data of an upstream intersection and a downstream intersection of the road section to be predicted, the third data comprise characteristics of the first data and characteristics of the second data, and the characteristics comprise spatial characteristics and/or temporal characteristics;

fusing the first data, the second data and the third data to obtain fused data;

wherein the traffic state prediction model is a final traffic state prediction model trained by the method of any one of claims 1-11.

13. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 11 or the method of claim 12.