CN113988488A

CN113988488A - Method for predicting ETC passing probability of vehicle by multiple factors

Info

Publication number: CN113988488A
Application number: CN202111610092.1A
Authority: CN
Inventors: 朱广
Original assignee: Shanghai Yihi Information Technology Service Co ltd; Shanghai Yihi Chengshan Automobile Rental Co ltd
Current assignee: Shanghai Yihi Information Technology Service Co ltd; Shanghai Yihi Chengshan Automobile Rental Co ltd
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2022-01-28
Anticipated expiration: 2041-12-27
Also published as: CN113988488B

Abstract

The invention provides a method for predicting the probability of a vehicle passing ETC by multiple factors, which comprises the following steps: s1, establishing a decision tree model for predicting the probability of the vehicle passing through the ETC based on a plurality of factors according to historical data; s2, inputting the real-time data into the decision tree model in the S1 for operation to obtain a prediction result of the vehicle corresponding to the real-time data passing through the ETC; and S3, comparing the actual result with the prediction result obtained in the S2, adjusting the decision tree model, and enhancing the accuracy of prediction. The method can predict the real-time ETC data according to the historical data and the model algorithm, thereby greatly reducing the cost of data matching and checking; through data comparison of the prediction result and the actual result, the model can continuously learn by self, and the calculation capability of the model is enhanced.

Description

Method for predicting ETC passing probability of vehicle by multiple factors

Technical Field

The invention relates to the field of vehicle driving, in particular to a method for predicting the probability of a vehicle passing ETC.

Background

At present, data information of vehicles passing through an ETC (electronic toll collection) system is generally acquired in a post-event and manual mode, and information including road sections, expenses and the like is acquired from a road network operation company or a bank at regular time post-event and then is matched and settled.

According to the traditional ETC data acquisition and processing mode, on one hand, data acquisition has long hysteresis, on the other hand, manual matching and accounting need to invest a large amount of manpower, and the cost is high.

The invention provides a method for predicting the probability of passing ETC of a vehicle by multiple factors, which inputs the ETC data actually occurring in history into a model for training. Obtaining the probability condition that the vehicle passes through the ETC under the condition of different kinds of factor combinations according to the total historical data; after the new order data is generated, all known factor data is input to the model. And obtaining the probability that the vehicle passes through the ETC under different factor combinations. And finally voting decision results of all groups of factors to obtain the probability of finally passing the ETC.

Because the factor data is known in real time, the prediction of the outcome is also available in real time. The problem of hysteresis in the traditional scheme is solved; after actual data are obtained, comparison between the predicted value and the actual value is carried out, the final consistency of the data is guaranteed, the data enter the model again after the actual value is obtained, the data become historical data of operation, and the prediction capability of the model is further enhanced.

Disclosure of Invention

At present, the traditional ETC data acquisition and processing mode has longer hysteresis, and a large amount of manpower is required for manual matching and accounting, so that the cost is higher. The invention provides a method for predicting the ETC passing probability of a vehicle by multiple factors to solve the problems, which can predict real-time ETC data according to historical data and a model algorithm and greatly reduce the cost of data matching and checking; according to the invention, through data comparison of the prediction result and the actual result, the self-learning can be continuously carried out, and the operational capability of the model is enhanced; the invention has real-time performance, can predict the cost of ETC probability at the first time, carries out pre-charging, improves the convenience degree of customers and saves the settlement cost.

In order to achieve the above object, the present invention provides a method for predicting a probability of a vehicle passing an ETC by multiple factors, comprising the following steps:

s1, according to historical data, establishing a decision tree model for predicting the probability of the vehicle passing through the ETC based on M factors;

s2, inputting the real-time data into the decision tree model in the S1 for operation to obtain a prediction result of the vehicle corresponding to the real-time data passing through the ETC;

and S3, comparing the actual result with the prediction result obtained in the S2, adjusting the decision tree model, and enhancing the accuracy of prediction.

Wherein, the step of S1 further comprises the following steps:

s11, calculating the probability value of each factor passing through the ETC according to historical data, and accordingly determining the splitting priority of all M factors;

s12, randomly extracting M factors from the M factors, wherein the value range of M is 2< M < M; establishing a decision tree for the m factors according to the splitting priority determined in the step S11;

s13, repeating the process of S12 until all factor combinations are traversed, and generating a large number of decision trees to form a random forest;

and S14, counting the probability that leaf nodes of all decision trees in the S13 pass the ETC in the full-scale historical data to obtain a decision tree model.

Wherein, the step of S11 further comprises the following steps:

s111, calculating a probability value of each factor passing through the ETC;

and S112, determining the splitting priority of all factors.

Wherein, the step of S12 further comprises the following steps:

s121, selecting a factor with the highest splitting priority from m factors as a root node of the decision tree to split according to the ETC passing probability value of each factor calculated in S11;

s122, establishing child nodes for different values of the factor with the highest splitting priority to generate a second-layer node;

s123, selecting a factor with a high splitting priority level for splitting the second layer of nodes according to the information gain selection;

and S124, repeating the process, and sequentially selecting the splitting factors according to the splitting priority from high to low until no factor can be selected.

Wherein, in the process of establishing the decision tree, if the branch probability exceeds the judgment probability P₁The splitting is stopped or until all m factors have been split.

Wherein, the step S2 specifically includes the following steps:

s21, when a new piece of real-time data is generated, all factor data of the data are substituted into the decision tree model obtained in the S1, and the probability that each leaf node meeting the conditions passes through the ETC in each decision tree can be obtained;

s22, all leaf nodes meeting the conditions participate in voting to obtain the prediction result and adjust the reference probability P₂(ii) a If one of the leaf nodes shows that the probability of passing through ETC exceeds the decision probability P₁If the vehicle can pass ETC, P as the prediction result, the voting is finished₂Is taken to be less than the decision probability P for all probabilities₁Carrying out equal weight averaging on the leaf nodes meeting the conditions; if the probability of no leaf node exceeds the decision probability P₁The predicted result is that the vehicle can not pass ETC, P₂The value of (1) is equal weight average of all leaf nodes meeting the condition.

Specifically, in S3, after the actual result occurs, the actual result is returned to the decision tree model in S1, and compared with the predicted result obtained in S2:

if the comparison result is consistent, storing the data in a source database for storage, and further enhancing the prediction accuracy of the decision tree model in S1;

if the comparison result is not consistent, the actual data and the result are returned to the decision tree model in S1, the newly added data are subjected to probability operation at regular time, the probability value of the related single factor in the decision tree model is updated, the splitting sequence of the decision tree is changed, the result data of each leaf node is changed at the same time, and the decision tree model is refreshed.

If the M +1 th factor is found in the comparison result, the S1 process of the model is rerun, and the model is expanded.

Wherein, in the process of establishing the model, the probability P is judged₁The value of (a) is fixed; after the actual result is obtained, the probability P can be judged according to the prediction accuracy₁The value of (a) is adjusted.

Wherein, if the comparison between the actual result and the predicted result shows that the accuracy of the predicted result is higher than that required by the product, the judgment probability P can be gradually reduced₁While ensuring the adjusted decision probability P₁Greater than the adjusted reference probability P₂(ii) a If the comparison between the actual result and the predicted result shows that the accuracy of the predicted result is lower than that required by the product, the judgment probability P is gradually increased₁The value of (c).

In summary, the present invention inputs the ETC data actually occurred in history into the model for training. And obtaining the probability condition that the vehicle passes through the ETC under the condition of different kinds of factor combinations according to the total amount of historical data. And finally voting decision results of all groups of factors to obtain the probability of finally passing the ETC. Because the factor data is known in real time, the prediction of the outcome is also available in real time. The problem of hysteresis in the traditional scheme is solved. After actual data are obtained, comparison between the predicted value and the actual value is carried out, and final consistency of the data is guaranteed. And after the actual value is obtained, the model is entered again to become historical data of operation, and the prediction capability of the model is further enhanced.

The method can predict the real-time ETC data according to the historical data and the model algorithm, thereby greatly reducing the cost of data matching and checking; the invention can continuously learn by self by comparing the data of the prediction result and the actual result, thereby strengthening the operational capability of the model.

Drawings

FIG. 1 is a schematic representation of the steps of a method of multi-factor prediction of the probability of a vehicle passing an ETC in accordance with the present invention;

fig. 2 is a decision tree obtained according to 5 factors in the embodiment.

Detailed Description

Technical solutions, structural features, achieved objects and effects in the embodiments of the present invention will be described in detail below with reference to fig. 1 to 2 in the embodiments of the present invention.

It should be noted that the drawings are simplified in form and not to precise scale, and are only used for convenience and clarity to assist in describing the embodiments of the present invention, but not for limiting the conditions of the embodiments of the present invention, and therefore, the present invention is not limited by the technical spirit, and any structural modifications, changes in the proportional relationship, or adjustments in size, should fall within the scope of the technical content of the present invention without affecting the function and the achievable purpose of the present invention.

It is to be noted that, in the present invention, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

A method for predicting the probability of a vehicle passing ETC in multiple factors, as shown in FIG. 1, comprises the following steps,

s1, establishing a decision tree model for predicting the ETC passing probability of the vehicle by multiple factors according to historical data;

s11, calculating the probability value of each factor passing through the ETC, and determining the splitting priority of all the factors;

s111, calculating a probability value of each factor passing through the ETC;

assuming that there are M factors, M is 10 in the present embodiment, and the 10 factors are "ETC entrance passing speed >5km/h (hereinafter, referred to as entrance vehicle speed >5 km/h)", "ETC exit passing speed >5km/h (hereinafter, referred to as exit vehicle speed >5 km/h)", "entrance congestion level < 3", "exit congestion level < 3", "driver driving age >5 years", "driver gender = male", "rainy while passing", "time of passing daytime", "legal holiday", "road category is national road", wherein a congestion level of 1 indicates heavy congestion, a level of 2 indicates medium congestion, a level of 3 indicates light congestion, and a level of 4 indicates clear traffic.

And calculating the probability of the data containing each factor passing through the ETC one by one according to all historical data. For example, from all the historical data, all the data containing a certain factor A are extracted, and the number of the data is x; and in the x pieces of data, the number of the ETC which is actually passed finally is y pieces, and the probability that the factor A passes the ETC is y/x. Specifically, assuming that there are 500 pieces of data corresponding to "entrance vehicle speed >5 km/h" out of all the history data, and 480 pieces of the 500 pieces of history data pass the ETC in the final practical situation, the probability of passing the ETC is 480/500, which is about 96%, when the factor "entrance vehicle speed >5 km/h" occurs.

Repeating the above process, the probability value of each factor passing through the ETC can be obtained.

S112, determining the splitting priority of all factors;

and determining the splitting priority according to the probability value of each factor from large to small, wherein the higher the probability value passing through the ETC is, the higher the splitting priority is, namely, the factor with higher splitting priority is selected, and when the probability values of the two factors are equal, one factor is selected to split preferentially at random.

S12, randomly extracting M factors from the M factors, wherein the value range of M is 2< M < M; establishing a decision tree for the m factors according to the splitting priority determined in the S11;

in practice, the adjustment can be performed according to the magnitude of the value M and the prediction result after operation, in this embodiment, M is 5, and M factors are "entrance vehicle speed >5 km/h", "exit vehicle speed >5 km/h", "entrance congestion level < 3", "exit congestion level < 3", and "driver's driving age >5 years", respectively.

S121, according to the probability value of each factor passing through the ETC, which is obtained through calculation in S11, selecting the factor with the highest probability value, namely the factor with the largest information gain, from the 5 factors in S12 as a root node of the decision tree for splitting, wherein the highest splitting priority in the 5 factors is 'the entrance vehicle speed is more than 5 km/h';

s122, establishing sub-nodes for different values of the factor with the highest splitting priority to generate a second layer of nodes, wherein in the embodiment, the different values are that the speed of the vehicle is more than 5km/h when the vehicle is in line with the entrance and is not in line with the entrance;

s123, selecting a factor with a high splitting priority level for splitting the second layer of nodes according to the information gain selection; in the embodiment, the 'exit speed >5 km/h' is selected as a factor for splitting again;

s124, repeating the process, and selecting the splitting factors from high to low in sequence according to the splitting priority until no factor can be selected; in this embodiment, "entrance congestion level < 3", "exit congestion level < 3", and "driver's driving age >5 years" are sequentially selected as the division factors.

In the above S12 process, if no pruning is performed, the m factors can form a decision tree with m layers in total, and 2 occurs in total^m-1 splitting to form 2^m-1And (6) obtaining the result. The method comprises the following steps of (1) not performing any pruning, namely establishing child nodes for factors at all levels in sequence, and not omitting all branches; the probability of the child node established for different values of each level of factors is the branch probability.

In the process of establishing the decision tree, if the branch probability exceeds the judgment probability P₁Stopping splitting, or until all m factors have been split; the decision probability P in the present embodiment₁A decision tree obtained by taking 95% of the 5 factors selected in this embodiment is shown in fig. 2.

all factor combinations refer to randomly extracting M attributes from all M attributes, how many combinations are possible: according to the combination formula, the number of all possible factor combinations can be calculated, i.e. how many decision trees are generated in the subsequent steps.

For example, if there are 4 factors all, and 3 factors are randomly drawn from them to form a combination, then there are 4 combinations possible, and finally 4 decision trees are formed; in this embodiment, M is 10, and M is 5, so that 252 decision trees are finally formed.

S14, counting the probability that leaf nodes of all decision trees in S13 pass through ETC in the full-scale historical data to obtain a decision tree model;

in the decision tree shown in fig. 2, the end point of each branch is a leaf node, each leaf node represents the probability that a vehicle passes through the ETC under certain factor combinations, and the probability is obtained by statistical calculation through historical data: in the total amount of historical data, the statistics accord with the' entrance vehicle speed>The probability that the vehicle with the condition of 5km/h passes through the ETC is 96 percent and is greater than the set judgment probability P₁If yes, stopping splitting to obtain a leaf node; statistics of non-compliance with' entry speed of vehicle>5km/h 'but corresponding to' exit vehicle speed>The probability that the vehicle with the condition of 5km/h passes through the ETC is 96 percent and is greater than the set judgment probability P₁If yes, stopping splitting to obtain a leaf node; statistics of non-compliance with' entry speed of vehicle>5km/h 'and not corresponding to' exit vehicle speed>5km/h 'and meets the requirement of' entrance congestion<3' vehicle, the probability of passing ETC is less than the set judgment probability P₁Then continue splitting; and repeating the steps to obtain each leaf node.

The leaf nodes in the decision tree of fig. 2 represent the following meanings:

the leaf node a represents "the probability of a vehicle passing through the ETC is 96% when the condition entrance vehicle speed >5km/h is satisfied";

the leaf node b indicates "when the condition entrance vehicle speed >5km/h is not satisfied and the condition exit vehicle speed >5km/h is satisfied, the probability that the vehicle passes the ETC is 96%";

the leaf node c indicates "when the conditions of entrance vehicle speed >5km/h and exit vehicle speed >5km/h are not satisfied, entrance congestion <3 satisfied, exit congestion <3 satisfied, driving age >5 years satisfied, the probability of the vehicle passing the ETC is 90%";

the leaf node d indicates "when the conditions of entrance vehicle speed >5km/h and exit vehicle speed >5km/h are not satisfied, entrance congestion <3 satisfied, exit congestion <3 satisfied, driving age >5 years are not satisfied, the probability of the vehicle passing the ETC is 85%";

a leaf node e indicates "when the conditions of entrance vehicle speed >5km/h and exit vehicle speed >5km/h are not satisfied, entrance congestion <3 satisfied, exit congestion <3 unsatisfied, driving age >5 years satisfied, the probability of the vehicle passing the ETC is 85%";

the leaf node f indicates "when the conditions of entrance vehicle speed >5km/h and exit vehicle speed >5km/h are not satisfied, entrance congestion <3 satisfied, exit congestion <3 unsatisfied, driving age >5 years are not satisfied, the probability of the vehicle passing the ETC is 80%";

the leaf node g indicates "when the conditions of entrance vehicle speed >5km/h and exit vehicle speed >5km/h are not satisfied, entrance congestion <3 is not satisfied, exit congestion <3 is satisfied, and driving age >5 years is satisfied, the probability that the vehicle passes through the ETC is 90%";

a leaf node h represents "when the conditions of entrance vehicle speed >5km/h and exit vehicle speed >5km/h are not satisfied, entrance congestion <3 is not satisfied, exit congestion <3 is satisfied, and driving age >5 years is not satisfied, the probability that a vehicle passes through ETC is 85%";

the leaf node i indicates "when the conditions of entrance vehicle speed >5km/h and exit vehicle speed >5km/h are not satisfied, entrance congestion <3 is not satisfied, exit congestion <3 is not satisfied, and driving age >5 years are satisfied, the probability that the vehicle passes through the ETC is 85%";

the leaf node j indicates "when the conditions of entrance vehicle speed >5km/h and exit vehicle speed >5km/h are not satisfied, entrance congestion <3 is not satisfied, exit congestion <3 is not satisfied, and driving age >5 years is not satisfied, the probability of the vehicle passing the ETC is 80%".

Repeating the above process can obtain all leaf nodes of all decision trees.

and S21, when a new piece of real-time data is generated, substituting all factor data of the piece of data into the decision tree model obtained in the S1 to obtain the probability that each leaf node meeting the conditions passes through the ETC in each decision tree, and thus obtaining the probability that the vehicle passes through the ETC under different factor combinations of the piece of real-time data.

Wherein, a leaf node meeting the condition means that all factors of the leaf node are in the factor data of the piece of data; for example, factors in a new piece of real-time data include: the method is characterized in that the method does not meet the conditions of entrance vehicle speed >5km/h, exit vehicle speed >5km/h, entrance congestion <3, exit congestion <3, driving age >5 years, sex = male driver, raining when passing is not satisfied, passing time in the daytime, legal holidays on the same day and national road type, and in the decision tree shown in fig. 2, the leaf node meeting the conditions is c and the probability is 90%.

S22, all leaf nodes meeting the conditions participate in voting to obtain the prediction result and adjust the reference probability P₂: if one of the leaf nodes shows that the probability of passing through ETC exceeds the decision probability P₁If yes, the voting is finished, and the prediction result is that the vehicle can pass ETC, P₂Is taken to be less than the decision probability P for all probabilities₁Carrying out equal weight averaging on the leaf nodes meeting the conditions; if the probability of no leaf node exceeds the set decision probability P₁When the vehicle cannot pass ETC, P as a result of prediction₂The value of (1) is equal weight average of all leaf nodes meeting the condition.

In this example P₁The value is 95%, namely if the probability that one leaf node displays that the leaf node passes through the ETC exceeds 95%, the prediction result is that the vehicle can pass through the ETC; and if the probability of all the leaf nodes meeting the conditions is less than 95%, the vehicle cannot pass the ETC according to the prediction result.

S3, comparing the actual result with the prediction result obtained in the S2, adjusting the model and enhancing the accuracy of prediction;

after the actual result occurs (the actual result is a fact result whether the vehicle corresponding to the new piece of real-time data can pass the ETC), the actual result is returned to the decision tree model in S1, and the actual result is compared with the predicted result obtained in S2: if the comparison result is consistent, the data is stored in the source database, and the prediction accuracy of the decision tree model in S1 is further enhanced.

If the comparison result is not consistent, the data is pushed to a service system, and an error correction notice is sent. And after the actual result occurs, returning the actual data and the result to the decision tree model in S1, regularly participating the newly added data in probability calculation, updating the probability values of the related single factors in the decision tree model, changing the splitting sequence of the decision tree, simultaneously changing the result data of each leaf node, and refreshing the decision tree model.

In the above process, if the comparison result of the actual result and the predicted result is consistent, the probability value of the relevant factor is improved, and the prediction tendency of the model is further strengthened. On the contrary, the probability value of the related single factor is reduced, the splitting priority of the related factor in the model and the result data of the leaf node are changed simultaneously, the current prediction tendency of the model is reduced, and the model evolves towards other directions.

If the M +1 th factor is found, the modeled S1 process is re-run, expanding the model. Because the newly added factors do not influence the generated decision tree, the overall prediction result can be kept more stably.

In the present invention, the probability P is determined₁Is fixed, but after the actual result occurs, the decision probability P can be determined₁Adjustment is performed to enable the accuracy of the prediction result to be higher: if the accuracy of the predicted result is extremely high (higher than the accuracy required by the product) in the comparison of the actual result and the predicted result, the judgment probability P can be gradually reduced₁While ensuring the adjusted decision probability P₁Greater than the adjusted reference probability P₂(ii) a Similarly, if the comparison between the actual result and the predicted result shows that the accuracy of the predicted result is lower than that required by the product, the judgment probability P is gradually increased₁The value of (c).

While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims

1. A method for predicting the probability of a vehicle passing an ETC in multiple factors is characterized by comprising the following steps:

s1, establishing a decision tree model for predicting the probability of the vehicle passing through the ETC based on M factors according to historical data;

2. The method according to claim 1, wherein said S1 further comprises the steps of:

s13, repeating the process of S12 until all factor combinations are traversed to generate a large number of decision trees to form a random forest;

3. The method according to claim 2, wherein said S11 further comprises the steps of:

s111, calculating a probability value of each factor passing through the ETC;

and S112, determining the splitting priority of all factors.

4. The method according to claim 2, wherein said S12 further comprises the steps of:

5. The method according to claim 4, wherein the decision tree is established if the branch probability exceeds the decision probability P₁The splitting is stopped or until all m factors have been split.

6. The method according to claim 5, wherein the step S2 specifically comprises the following steps:

s22, all leaf nodes meeting the conditions participate in voting to obtain the prediction result and adjust the reference probability P₂(ii) a If one of the leaf nodes shows that the probability of passing through ETC exceeds the decision probability P₁If the vehicle can pass ETC, P as the prediction result, the voting is finished₂Is taken to be less than the decision probability P for all probabilities₁Carrying out equal weight averaging on the leaf nodes meeting the conditions; if the probability of no leaf node exceeds the decision probability P₁The predicted result is that the vehicle can not pass ETC, P₂Is taken asAll eligible leaf nodes are equally weighted and averaged.

7. The method according to claim 1, wherein the step S3 is implemented by returning the actual result to the decision tree model of S1 after the actual result occurs, and comparing the actual result with the predicted result obtained in S2:

8. The method for predicting ETC probability through multiple factors according to claim 7, wherein if the M +1 th factor is found in the comparison result, the modeling S1 process is re-run, and the model is expanded.

9. The method according to claim 6, wherein the probability P is determined during model building₁The value of (a) is fixed; after the actual result is obtained, the probability P can be judged according to the prediction accuracy₁The value of (a) is adjusted.

10. The method according to claim 9, wherein the determination probability P is gradually decreased if the accuracy of the predicted result is higher than the accuracy required by the product, as found in the comparison of the actual result with the predicted result₁While ensuring the adjusted decision probability P₁Greater than the adjusted reference probability P₂(ii) a If the actual result is compared with the predicted result, the methodIf the accuracy of the measured result is lower than the accuracy required by the product, the judgment probability P is gradually increased₁The value of (c).