The application is a divisional application of an invention patent application with application number 201911084257.9, which is filed on 07 th 11 th month in 2019 and has the name of 'intelligent safety early warning method, device, system and storage medium for vehicles'.
Disclosure of Invention
The invention mainly aims to provide an intelligent safety early warning method, device and system for a vehicle and a storage medium, which are used for solving the technical problems that early warning is not timely and the driving level cannot be objectively reflected in the prior art.
As a first aspect of the present invention, an embodiment of the present invention provides an intelligent safety precaution method for a vehicle, where the method includes:
the method comprises the following steps of performing early warning, namely establishing a first data model based on environmental information of a path to be traveled by a vehicle and environmental real-time traffic road condition information, and sending first early warning information;
in-line early warning step, establishing a second data model to predict risks based on the real-time position information of the vehicle, the environment information of the vehicle and the condition information of the vehicle, and sending second early warning information;
the establishing of the second data model predictive risk in the in-line prediction step includes:
integrating data source data including position information, environment information and vehicle condition information;
partitioning data source data, and respectively dividing the data source data into a training set, a verification set and a test set;
respectively processing the data variables in the training set, the verification set and the test set to generate characteristic variables which accord with a second preset condition and are to be input into the candidate prediction model;
sequentially operating respective characteristic variables in each candidate prediction model according to the sequence of training, verifying and testing data to obtain the accuracy and the prediction result under each candidate model;
and selecting an optimal prediction model from the candidate prediction models to serve as the second data model.
Preferably, in the in-line early warning step, the information for establishing the second data model further includes historical driving information.
Preferably, the second data model is further used for predicting risk according to the driving level of the driver evaluated by the historical driving information.
Preferably, the processing the data variables in the training set, the verification set and the test set respectively to generate the feature variables meeting the first preset condition and to be input into the candidate prediction model includes:
the variable conversion is to select specified data variables from the training set, the verification set and the test set respectively and convert the data variables into data variables of data types which can be identified by the candidate prediction model;
performing variable clustering, and aggregating the data variables with similar characteristic expressions in the training set, the verification set and the test set according to characteristic similarity to reduce data dimensionality in a data source;
data merging, namely inputting the derived variables and the original variables into a preset prediction model together to generate a plurality of characteristic variables;
and selecting the characteristic variables meeting the second preset condition.
Preferably, the data in the training set and/or validation set and/or test set comprises at least one of mileage, vehicle speed, driving duration, weather, wind, temperature, road type, whether off-road, whether lane change, whether overspeed, and vehicle distance.
Preferably, in the step, respective characteristic variables are sequentially operated in each candidate prediction model according to the sequence of training, verifying and testing data to obtain the accuracy and the prediction result of each candidate model, and the prediction accuracy of each candidate prediction model is checked by using an ROC curve.
Preferably, the partitioning the data source data into the training set, the verification set and the test set respectively includes:
taking A% of data in data source data as a training set;
taking B% of data in the data source data as a verification set;
taking C% in data source data as a test set;
wherein A is more than or equal to B and more than or equal to C, and A + B + C is 100.
As a second aspect of the present invention, an embodiment of the present invention provides an intelligent safety precaution device for a vehicle, where the device includes:
the system comprises a forward early warning device, a data acquisition device and a data processing device, wherein the forward early warning device is used for establishing a first data model and sending first early warning information based on environmental information of a path to be traveled by a vehicle and environmental real-time traffic road condition information;
the in-line early warning device is used for establishing a second data model to predict risks and sending second early warning information based on the real-time position information of the vehicle, the environment information of the vehicle and the condition information of the vehicle;
the in-line warning device is further configured to:
integrating data source data including position information, environment information and vehicle condition information;
partitioning data source data, and respectively dividing the data source data into a training set, a verification set and a test set;
respectively processing the data variables in the training set, the verification set and the test set to generate characteristic variables which accord with a second preset condition and are to be input into the candidate prediction model;
sequentially operating respective characteristic variables in each candidate prediction model according to the sequence of training, verifying and testing data to obtain the accuracy and the prediction result under each candidate model;
and selecting an optimal prediction model from the candidate prediction models to serve as the second data model.
As a third aspect of the present invention, an embodiment of the present invention provides an intelligent safety precaution system for a vehicle, where the system includes:
at least one vehicle-mounted terminal electronic device, each vehicle-mounted terminal electronic device comprising: a first processor, a first memory communicatively coupled to the first processor;
a server, the server comprising: at least one second processor, a second memory communicatively coupled to the second processor;
wherein the first memory stores instructions executable by the first processor to implement the method of any one of the preceding claims; or
The second memory stores instructions executable by the second processor to implement the method of any one of the preceding claims.
As a fourth aspect of the present invention, the present invention provides a computer-readable storage medium, wherein the storage medium stores a piece of computer program instructions, and when the computer program instructions are executed by a processor, the computer program instructions implement the method according to any one of the foregoing.
In summary, the intelligent safety early warning method, the intelligent safety early warning device, the intelligent safety early warning system and the storage medium for the vehicle provided by the embodiment of the invention can objectively reflect the driving level, and carry out safety risk early warning by combining real-time road condition information, weather conditions and vehicle real-time data, so that the occurrence rate of traffic accidents is reduced.
Example 1
Referring to fig. 1 to 5b, embodiment 1 of the present invention provides an intelligent vehicle safety early warning method, which mainly implements multidimensional and multi-space vehicle data and environment data, and implements safety early warning during vehicle driving by combining a big data method. The vehicle can be a vehicle (unmanned automobile, electric automobile, and other various types of land vehicles), and can also be other types of vehicles, such as amphibious vehicles or air-land vehicles. Taking a vehicle as an example, the method combines historical driving data of the vehicle and the vehicle type with matched environmental data and historical driving data of other vehicle types of accident characteristics by analyzing real-time data of the vehicle and external environmental data of the current vehicle and using a big data method. The driving condition of a driver is comprehensively evaluated by using a driving evaluation actuarial model according to multidimensional data such as driving behavior distribution conditions, mileage, travel conditions, violent driving conditions, weather, road conditions and the like, the driving level of the driver is objectively reflected, and safety risk early warning is carried out by combining real-time road condition information, weather conditions and vehicle real-time data. The intelligent safety early warning method for the vehicle in the embodiment 1 of the invention mainly comprises the following steps:
s1, a pre-warning step, namely establishing a first data model based on the environmental information of the to-be-driven path of the vehicle and the environmental real-time traffic road condition information, and sending first warning information;
and S2, performing in-line early warning, establishing a second data model for predicting risks based on the real-time position information, the environment information and the condition information of the vehicle, and sending second early warning information.
Preferably, the establishing of the first data model in the pre-trip warning step of step S1 includes:
integrating data source data including environment information and environment real-time traffic road condition information;
partitioning data source data, and respectively dividing the data source data into a training set, a verification set and a test set;
respectively processing the data variables in the training set, the verification set and the test set to generate characteristic variables which accord with a first preset condition and are to be input into a candidate prediction model;
sequentially operating respective characteristic variables in each candidate prediction model according to the sequence of training, verifying and testing data to obtain the accuracy and the prediction result under each candidate model;
and selecting an optimal prediction model from the candidate prediction models to serve as the first data model.
Preferably, the establishing of the first data model in the pre-trip early warning step further includes, before the partitioning the data source data into the training set, the verification set, and the test set, respectively:
and exploring basic information conditions of the data source data, wherein the basic information conditions comprise one or more of data missing, data abnormity, distribution conditions of data variables and correlation of the data variables.
Preferably, the processing the data variables in the training set, the verification set and the test set respectively to generate the feature variables meeting the first preset condition and to be input into the candidate prediction model includes:
the variable conversion is to select specified data variables from the training set, the verification set and the test set respectively and convert the data variables into data variables of data types which can be identified by the candidate prediction model;
performing variable clustering, and aggregating the data variables with similar characteristic expressions in the training set, the verification set and the test set according to characteristic similarity to reduce data dimensionality in a data source;
data merging, namely merging the variable obtained after variable conversion and the variable after variable clustering and aggregating to generate a plurality of characteristic variables;
and selecting the characteristic variables meeting the first preset condition.
Preferably, the establishing of the second data model in the in-line prediction step to predict risk comprises:
integrating data source data including position information, environment information and vehicle condition information;
partitioning data source data, and respectively dividing the data source data into a training set, a verification set and a test set;
respectively processing the data variables in the training set, the verification set and the test set to generate characteristic variables which accord with a second preset condition and are to be input into the candidate prediction model;
sequentially operating respective characteristic variables in each candidate prediction model according to the sequence of training, verifying and testing data to obtain the accuracy and the prediction result under each candidate model;
and selecting an optimal prediction model from the candidate prediction models to serve as the first data model.
Preferably, the processing the data variables in the training set, the verification set and the test set respectively to generate the feature variables meeting the second preset condition and to be input into the candidate prediction model includes:
the variable conversion is to select specified data variables from the training set, the verification set and the test set respectively and convert the data variables into data variables of data types which can be identified by the candidate prediction model;
performing variable clustering, and aggregating the data variables with similar characteristic expressions in the training set, the verification set and the test set according to characteristic similarity to reduce data dimensionality in a data source;
data merging, namely merging the variable obtained after variable conversion and the variable after variable clustering and aggregating to generate a plurality of characteristic variables;
and selecting the characteristic variables meeting the second preset condition. Preferably, the variable conversion in the embodiment of the present invention includes at least one of:
1) a composite variable MTHour _ i algorithm of the driving mileage and the time (hour);
the variable MTHour _ i represents the total mileage traveled in different quarters, i ∈ {1, 2, …, 24 }; when i is 1, the total driving mileage in the 1 st hour is represented, when i is 2, the total driving mileage in the 2 nd hour is represented, and when i is 24, the total driving mileage in the 24 th hour is represented;
composite variables for mileage and time (hours) dimensions; compared with the existing single variable which only adopts the driving mileage, the composite variable generated by the data processing algorithm can better reflect the projection of the driving risk in different time (hour) dimensions;
2) a composite variable St _ i algorithm of the running speed and the running time;
the variable St _ i represents the total duration of travel within a specific speed range, i ∈ {1, 2, 3, 4, 5 }; when i is 1, the total duration of the low-speed driving stage is represented; taking 2 hours as the total duration of the medium-speed running stage; taking 3 hours as the total duration of the medium-speed running stage; when i is 4, the total duration of the medium-high speed driving stage is represented; when i is 5, the total duration of the high-speed driving stage is represented;
the composite variable of the running speed and the running duration dimension is obtained; compared with the single dimension of only adopting the driving time, the composite variable generated by the data processing algorithm can better reflect the projection of the driving risk during driving.
3) A standardized algorithm for driving risk events;
defining a variable Em _ i to represent the total number of dangerous events of vehicle driving acquired in a specific time; wherein i belongs to {1, 2, 3, 4, 5, 6} respectively represents 6 types of dangerous events; defining a variable Mt _ i as a total driving mileage acquired in the same time with the variable Em _ i; the normalized variable
Compared with the non-standardized driving dangerous event times, the standardized dangerous event time variable obtained through data standardization processing can well eliminate deviation caused by driving mileage or observation time difference, and real driving risk exposure is better reflected.
4) A composite algorithm of the driving mileage and the weather;
defining variables Mw _ i to show the total mileage traveled in a specific weather; wherein i belongs to {1, 2, 3, 4}, and i is 1 to represent normal weather; i-2 represents general severe weather; i-3 represents moderately severe weather; i-4 represents particularly severe weather;
is a composite variable of the driving mileage and the meteorological dimension; compared with the existing single variable which only adopts the driving mileage, the composite variable generated by the data processing algorithm can better reflect the projection of the driving risk in different meteorological dimensions.
5) A composite algorithm of the driving mileage and the temperature;
defining a variable Mt _ i to represent the total driving mileage within a specific temperature; wherein i belongs to {1, 2, 3, 4}, and i is 1 to represent an ultra-low temperature stage; i ═ 2 represents the low temperature stage; i-3 represents the normal temperature phase; i-4 represents a high temperature stage; then
Is a composite variable of the driving mileage and the temperature dimension; compared with the existing single variable which only adopts the driving mileage, the composite variable generated by the data processing algorithm can better reflect the projection of the driving risk in different temperature dimensions.
6) A composite algorithm of the driving mileage and the wind power;
defining a variable Mwind _ i to represent the total mileage traveled in a specific wind interval; wherein i belongs to {1, 2, 3, 4, 5}, and i ═ 1 represents the wind stage; i-2 represents a strong wind stage; i-3 represents a strong wind stage; i-4 represents the stage of gusty wind; i-5 represents a typhoon or hurricane stage;
is a composite variable of the driving mileage and the wind power dimension; compared with the existing single variable which only adopts the driving mileage, the composite variable generated by the data processing algorithm can better reflect the projection of the driving risk in different wind power dimensions.
The data processing conversion algorithm is not limited to the above-presented algorithm, and an algorithm that performs variable conversion and then performs composition.
Preferably, the first warning information includes: risk early warning information, vehicle condition fault information and driving assistance information; the second warning information includes: driving behavior information, risk early warning information, vehicle condition and fault information and driving assistance information.
Early warning is difficult to accurately perform due to the delay of the existing data processing and the fact that a large data scheme is not used. Because a big data technology architecture is adopted, prediction is respectively carried out through three models, namely a decision tree model, a logistic regression model and a neural network model, and the model with the best test effect is selected as the optimal model.
1. And integrating data of data sources, such as multi-dimensional data of mileage, travel condition, violent driving condition, weather, road condition and the like.
2. And (6) data exploration. And exploring and knowing basic conditions of data, such as data missing, data abnormality, distribution conditions of variables, correlation of each variable and the like.
3. And (5) partitioning data. 60% of the data were used as training set, 30% as validation set, and 10% as test set.
4. And converting variables, namely converting the data into data types supported by the model. Such as data discretization, data normalization, data regularization, and the like.
5. And (6) clustering variables. Because the data source has many dimensions, and data in some dimensions may be similar in feature appearance and have little influence on the model, it is necessary to aggregate these similar features, that is, perform dimensionality reduction on the data, for example, dimensionality reduction by a Principal Component Analysis (PCA) method, and the like.
6. And (6) merging the data. And merging the variable conversion and the clustered data to be used as the characteristic variables to enter the model.
7. And selecting characteristic variables. And (4) screening the characteristics by using methods such as an R side and a chi side, and selecting the first n characteristics which are wanted by us.
8. And (4) operating each model in sequence according to training, verification and test data to obtain the results of model accuracy, prediction and the like.
And (4) selecting an optimal model. And selecting the best-performing model by methods such as the accuracy of the model, the learning curve of the model and the like.
The system can realize information second-level processing by adopting a stream processing technology, so that the early warning information based on real-time vehicle conditions and road conditions is realized from second level of the whole flow of 'vehicle-network side first processing-system secondary processing-output risk early warning'.
The early warning information based on real-time vehicle conditions and road conditions is divided into 4 large-dimension early warning systems such as vehicle condition fault type, driving behavior type, risk early warning type and other auxiliary type.
The whole process of risk early warning is described by taking a driving behavior class as an example as follows: the training data, the verification data and the prediction data comprise data such as vehicle speed, driving time, weather, temperature, road type, whether the vehicle deviates from the road or not, whether the vehicle changes lanes or not, whether the vehicle is overspeed or not, vehicle distance and the like;
1. the prediction data is data generated in real time in the driving process, so that the model can predict whether safety risk exists or the risk degree through the real-time data.
3) Model inspection
1. And (5) training a model. The training data includes characteristics such as mileage, speed, time, location, driving duration, road type, overspeed condition, road condition, weather such as weather, temperature, wind power, whether to save or leave holidays, fatigue driving, lane change, speed at high-speed intersections, and the like.
2. And (6) verifying the model. After model training, model evaluation is performed by using a cross-validation method (such as KFold), and an evaluation result is obtained.
3. The accuracy of the visual model training, verification and evaluation processes, and whether the model fitting data is over-fitting or under-fitting.
A. And (5) checking the prediction accuracy of the model by using an ROC curve test. The closer the ROC curve is to the upper left corner, the higher the accuracy of the test. The point of the ROC curve closest to the top left corner is the best threshold with the least number of errors, and the least total number of false positives and false negatives. The AUC value is the area of the region covered by the ROC curve, and the larger the AUC is, the better the classification effect of the classifier is.
B. The LIFT cumulative lifting degree graph can visually compare the discrimination capability gain degree brought by different models or strategies.
C. And (5) learning a curve. And observing the model in what state, whether under-fitting or over-fitting through the learning curve. Thereby deciding how to operate on the model.
ROC curve test and LIFT cumulative boost for the model: