CN112633781A - Vehicle energy consumption evaluation method based on Internet of vehicles big data - Google Patents
Vehicle energy consumption evaluation method based on Internet of vehicles big data Download PDFInfo
- Publication number
- CN112633781A CN112633781A CN202110248864.5A CN202110248864A CN112633781A CN 112633781 A CN112633781 A CN 112633781A CN 202110248864 A CN202110248864 A CN 202110248864A CN 112633781 A CN112633781 A CN 112633781A
- Authority
- CN
- China
- Prior art keywords
- data
- energy consumption
- vehicle
- real
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000005265 energy consumption Methods 0.000 title claims abstract description 104
- 238000011156 evaluation Methods 0.000 title claims abstract description 24
- 238000012549 training Methods 0.000 claims abstract description 74
- 230000002159 abnormal effect Effects 0.000 claims abstract description 26
- 238000012545 processing Methods 0.000 claims abstract description 23
- 238000007781 pre-processing Methods 0.000 claims abstract description 19
- 238000012544 monitoring process Methods 0.000 claims abstract description 16
- 230000000007 visual effect Effects 0.000 claims abstract description 15
- 238000004140 cleaning Methods 0.000 claims abstract description 14
- 238000000034 method Methods 0.000 claims description 28
- 238000012795 verification Methods 0.000 claims description 17
- 238000010200 validation analysis Methods 0.000 claims description 15
- 238000005070 sampling Methods 0.000 claims description 12
- 238000004458 analytical method Methods 0.000 claims description 10
- 238000003860 storage Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 9
- 238000004422 calculation algorithm Methods 0.000 claims description 8
- 238000012216 screening Methods 0.000 claims description 8
- 238000002790 cross-validation Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 5
- 238000004088 simulation Methods 0.000 claims description 5
- 238000012935 Averaging Methods 0.000 claims description 3
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 238000013500 data storage Methods 0.000 claims description 3
- 238000000638 solvent extraction Methods 0.000 claims description 3
- 238000013024 troubleshooting Methods 0.000 abstract description 2
- 238000005516 engineering process Methods 0.000 description 6
- 230000002776 aggregation Effects 0.000 description 4
- 238000004220 aggregation Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 238000013467 fragmentation Methods 0.000 description 4
- 238000006062 fragmentation reaction Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000010705 motor oil Substances 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2477—Temporal data queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/26—Visual data mining; Browsing structured data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/40—Business processes related to the transportation industry
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Economics (AREA)
- Marketing (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Educational Administration (AREA)
- Development Economics (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Testing And Monitoring For Control Systems (AREA)
Abstract
The invention provides a vehicle energy consumption evaluation method based on Internet of vehicles big data, which comprises a data acquisition stage, a training stage and a prediction stage; the data acquisition phase comprises: acquiring data to obtain original data, cleaning the data, and temporarily storing the data; the training phase comprises: data preprocessing, offline characteristic engineering, model training to generate an XGboost model, monitoring indexes and visually displaying training results; the prediction phase comprises: real-time data processing, real-time characteristic engineering, real-time prediction, visual display of predicted values and abnormal monitoring alarm; the training phase is the basis of the prediction phase, and the prediction phase carries out prediction on the basis of the XGboost model generated in the training phase. The invention can monitor the energy consumption level of the vehicle in real time, and provides a reliable basis for troubleshooting of the vehicle based on a perfect early warning system.
Description
Technical Field
The invention relates to a vehicle energy consumption evaluation method based on Internet of vehicles big data, and belongs to the field of vehicle energy consumption evaluation.
Background
With the continuous development of scientific technology, the aspects related to the engine are more and more, people also research various types of engines with different purposes in an effort mode, and people pay more and more attention to the economical efficiency of vehicles; in the conventional automobile industry, energy consumption analysis technology is also available, but most of the energy consumption analysis technology is based on single vehicles or single-brand vehicles for sampling analysis and evaluation, and the vehicles are required to be driven to a specific site for testing.
In recent years, the development scale of big data in various industries is rapidly enlarged and developed rapidly, data generated by industrial application is explosively increased, the Internet of things is developed along with the development of the big data, and the big data of the Internet of vehicles is also initially large in scale, but the application of combining the big data with machine learning to finish vehicle energy consumption evaluation is lacked at present; the energy consumption level of a vehicle is defined by sampling and evaluating a small number of vehicles, and although a certain problem can be reflected, the energy consumption level of the vehicle is limited, and the limitation condition is harsh, so that the energy consumption level of the vehicle cannot be applied on a large scale.
The traditional vehicle energy consumption evaluation method needs a large amount of offline sampling detection to increase the universality and accuracy of results, but cannot monitor whether the energy consumption of each vehicle is abnormal in real time; on the basis of a big data technology, the problem that the vehicle energy consumption evaluation cannot be completed due to numerous energy consumption factors and lack of a complete scientific data processing and analyzing method can also be solved, and if the energy consumption level of one vehicle is required to be detected, the energy consumption level can be detected only by a repair shop or a detection mechanism; the big data real-time processing technology and machine learning are combined, the energy consumption level of each vehicle can be monitored in real time, and the defects of time consumption and labor consumption of the traditional detection method are perfectly overcome.
Disclosure of Invention
The invention provides a vehicle energy consumption evaluation method based on Internet of vehicles big data, and aims to solve the problem that the vehicle energy consumption level cannot be monitored in real time in the prior art.
The technical solution of the invention is as follows: a vehicle energy consumption evaluation method based on Internet of vehicles big data comprises a data acquisition stage, a training stage and a prediction stage; the data acquisition phase comprises: step 1-1) acquiring data to obtain original data, step 1-2) cleaning the data, and step 1-3) temporarily storing the data; the training phase comprises: step 2-1) data preprocessing, step 2-2) off-line feature engineering, step 2-3) model training is carried out to generate an XGboost model, step 2-4) indexes are monitored, and step 2-5) training results are displayed in a visual mode; the prediction phase comprises: step 3-1) real-time data processing, step 3-2) real-time feature engineering, step 3-3) real-time prediction, step 3-4) visual display of a predicted value, and step 3-5) abnormal monitoring and alarming; the training phase is the basis of the prediction phase, and the prediction phase carries out prediction on the basis of the XGboost model generated in the training phase.
Further, the acquiring of the raw data by the data acquisition specifically includes: the method comprises the steps of carrying out real-time transmission and collection on original message data collected by a vehicle terminal, and analyzing the collected original message data in batches to obtain the original data.
Further, the data cleansing specifically includes: unreasonable data and null data in the original data obtained by data acquisition are removed.
Further, the temporarily storing the data specifically includes: performing data temporary storage on the original data subjected to data cleaning by adopting a temporary storage module; and temporarily storing the original data for more than three days as offline data.
Further, the data preprocessing specifically includes the following processes:
1) the continuity of the off-line data is marked in a slicing mode, the off-line data with the time interval larger than 20s is marked as 1 for two continuous off-line data, and other data in the off-line data are marked as 0, so that the selective deviation caused by the subsequent screening of continuous working conditions is avoided;
2) after the offline data are marked in a slicing mode, data collected by a vehicle terminal with a very small data amount are removed; the data volume calculates the total sampling time of each vehicle terminal in one day according to the sampling frequency of the specific vehicle terminal, and for the vehicle terminals of which the span of the total sampling time in one day is not more than half an hour, the data collected by the vehicle terminals are regarded as the data collected by the vehicle terminals with extremely small data volume;
3) eliminating abnormal data; the abnormal data is one or two or three of the three conditions that the vehicle speed is more than or equal to 200km/h, the engine rotating speed is negative and the energy consumption is negative;
4) according to the time of collecting and uploading the original message data by the vehicle terminal, the time mark is accurate to minutes (before conversion: 12:10:50, 12:10: 20; 12:10) after conversion, and converting into corresponding minute grade;
5) rejecting data of the corresponding terminal for the minute, wherein the fragmentation mark comprises 1; for the data of which the fragmentation mark comprises 1 after aggregation, the data continuity in the minute is poor, the difference from the conventional working condition is large, and the reference value is not available.
Further, the offline feature engineering is to perform feature value extraction processing of original features on offline data subjected to data preprocessing, and divide feature values subjected to extraction processing into a training set and a verification set; the original characteristics comprise vehicle speed, engine speed and energy consumption; the off-line data is original data temporarily stored for more than three days; and extracting the characteristic values of the original characteristics of the off-line data, and dividing the extracted characteristic values into a training set and a verification set.
Further, the characteristic value extraction processing of the energy consumption includes screening the data of the energy consumption, and the screening of the data of the energy consumption specifically includes the following steps:
1) aggregating the minute-level data formed after the data preprocessing, and paying attention to the average value of the rotating speed of each vehicle in each minute;
2) partitioning the average value of the engine speed; preferably, the average range of the rotating speed of the engine is between 0 and 2000r/min, and the average range of the rotating speed of the engine is divided into 200 small intervals by taking each 10 rotating speed values as a small interval;
3) and drawing a relation graph of the engine rotating speed and the energy consumption, calculating a 90 quantile to obtain a 90 quantile point in each engine rotating speed interval, and taking the energy consumption below the 90 quantile point as data of a normal energy consumption level for training a regression model of the energy consumption.
Further, the generating of the XGBoost model through model training specifically includes: using the training set for model training, and eliminating missing values in the training set to ensure that no missing value is included before model training; the energy consumption is used as a label, the characteristic value of the original characteristic except the energy consumption obtained after off-line characteristic engineering is used as a working condition characteristic, and a supervised learning method is adopted to construct a regression model by using an XGboost algorithm to fit the relevance between the energy consumption and the working condition characteristic because the label information is clear; the hyper-parameter of the XGboost algorithm is set to a default value.
Further, the monitoring index is specifically: in the model training process, a cross validation method is adopted, relevant parameters of a validation set need to be recorded, the relevant parameters of the validation set are used as monitoring indexes, and the relevant parameters of the validation set are obtained through the following steps:
1) substituting the verification set into an XGboost model, and obtaining a simulation value after fitting through the XGboost model;
2) summing the offline data within minutes to obtain real energy consumption;
3) using formulasThe mean square error of the validation set is calculated, wherein,is the mean-square error of the signal,is the true value of the,is an analog value;
4) using formulasAveraging the absolute error of the validation set, whereinIs the average of the absolute errors that are,is the true value of the,is an analog value;
5) using formulasDetermining a decision coefficient of the verification set, whereinIn order to determine the coefficients, the coefficients are,is the true value of the,is a value that is an analog value,is the mean of all true values.
Furthermore, the real-time data processing adopts streaming processing, and the data is cleaned and preprocessed in real time to prepare for real-time characteristic engineering.
Further, the real-time feature engineering specifically includes: acquiring a minute-level characteristic value of real-time data for real-time prediction; the real-time prediction specifically comprises the following steps: and substituting the combined real-time data into the XGboost model subjected to cross validation in the training stage for prediction to obtain a predicted value, and comparing the deviation between the predicted value and the true value.
Further, the visual display of the predicted value specifically includes: and (3) dotting the predicted value and the real value on a rectangular coordinate system by taking time as a horizontal axis and taking the predicted value and the real value as a vertical axis, and displaying on a visual platform through a chart to take a point with the real value 1.2 times higher than the predicted value as an abnormal point.
The abnormity monitoring alarm is used for avoiding the judgment of results influenced by errors of data points of a single minute level, so statistics is carried out according to a daily period, when the data exceeding 20% is abnormal, the energy consumption level of a vehicle is considered to be deviated, and the alarm for the vehicle with abnormal energy consumption can be realized.
The invention has the advantages that:
1) the invention realizes the energy consumption evaluation work of the vehicle under the current big data scene, can monitor the energy consumption level of the vehicle in real time, provides a reliable basis for troubleshooting of the vehicle based on a perfect early warning system, avoids energy waste caused by unknown abnormal energy consumption to a certain extent, and reduces the risk of accidents caused by failure in time to find out the engine failure;
2) based on the big data of the Internet of vehicles and the machine learning technology, a reasonable analysis model can be constructed on the basis of mass data collection, storage and calculation, and the two models supplement each other, so that the accuracy of the analysis result of the model is further improved, and the detection cost is reduced.
Drawings
FIG. 1 is a flow chart of the overall implementation of the present invention.
Fig. 2 is a detailed flow of the training phase.
FIG. 3 is a flow chart of an embodiment of the prediction phase.
Detailed Description
The present invention will be described in detail with reference to the following embodiments.
A vehicle energy consumption evaluation method based on Internet of vehicles big data comprises a data acquisition stage, a training stage and a prediction stage; the data acquisition phase comprises: step 1-1) acquiring data to obtain original data, step 1-2) cleaning the data, and step 1-3) temporarily storing the data; the training phase comprises: step 2-1) data preprocessing, step 2-2) off-line feature engineering, step 2-3) model training is carried out to generate an XGboost model, step 2-4) indexes are monitored, and step 2-5) training results are displayed in a visual mode; the prediction phase comprises: step 3-1) real-time data processing, step 3-2) real-time feature engineering, step 3-3) real-time prediction, step 3-4) visual display of a predicted value, and step 3-5) abnormal monitoring and alarming; the training phase is the basis of the prediction phase, and the prediction phase carries out prediction on the basis of the XGboost model generated in the training phase.
The data acquisition method specifically comprises the following steps of: the method comprises the steps that original message data collected by a vehicle terminal are transmitted and collected in real time, and the original data are obtained through batch analysis; analysis standards adopted when different batches of original message data are analyzed in batches are different, and unreasonable data and null data exist, so that data cleaning is needed.
The data cleaning specifically comprises the following steps: unreasonable data and null data in original data obtained by data acquisition are removed, corresponding problem data are extracted and named uniformly; the unreasonable data is data beyond the conventional theory, for example, the vehicle speed is more than or equal to 200 km/h; the null data is data of which values are not acquired, dirty data and a small amount of data are lost due to reasons such as network transmission abnormity or terminal abnormity, and corresponding problem data need to be extracted and named uniformly due to different names of the same data generated by the change of an analysis standard.
The temporary data storage specifically includes: performing data temporary storage on the original data subjected to data cleaning by adopting a temporary storage module; because offline data are needed in the training stage, and real-time data are needed to be predicted in the prediction stage, the temporary storage module is adopted to temporarily store the data of the original data after data cleaning, and the effects of preventing data loss and automatically adjusting the data pulling speed according to the processing capacity are achieved.
The data preprocessing specifically comprises the following processes:
1) the continuity of the off-line data is marked in a slicing mode, the off-line data with the time interval larger than 30s is marked as 1 for two continuous off-line data, and other data in the off-line data are marked as 0, so that the selective deviation caused by the subsequent screening of continuous working conditions is avoided;
2) after the offline data are marked in a slicing mode, data collected by a vehicle terminal with a very small data amount are removed; the data volume calculates the total sampling time of each vehicle terminal in one day according to the sampling frequency of the specific vehicle terminal, and for the vehicle terminals of which the span of the total sampling time in one day is not more than half an hour, the data collected by the vehicle terminals are regarded as the data collected by the vehicle terminals with extremely small data volume; the distribution of data acquired by a vehicle terminal with a small data volume is greatly different, so that the model training process of the XGboost model is interfered;
3) eliminating abnormal data; the abnormal data is one or two or three of the three conditions that the vehicle speed is more than or equal to 200km/h, the engine rotating speed is negative and the energy consumption is negative;
4) according to the time of collecting and uploading original message data by the vehicle terminal, the time mark neglects seconds and is accurate to the minute level (before conversion: 12:10:50, 12:10: 20; 12:10) after conversion, and is used for subsequent operation; aggregation is needed to be carried out on offline data within the same minute when XGboost model modeling is carried out subsequently, and the aggregation mode is shown in table 1;
5) rejecting data of the corresponding terminal for the minute, wherein the fragmentation mark comprises 1; for the data of which the fragmentation mark comprises 1 after aggregation, the data continuity in the minute is poor, the difference from the conventional working condition is large, and the reference value is not available.
The data preprocessing is caused by the reasons of personal behavior of a driver, abnormal state of a terminal, network transmission and the like, the integrity of data acquired by partial vehicle terminals is poor, and the condition of discontinuous data is very common; therefore, before modeling is carried out by model training, data preprocessing needs to be carried out on offline data so as to be used for building of offline feature engineering and modeling of model training.
Through the data preprocessing process, the data basically meet the requirements of offline feature engineering, and then a training set and a verification set are constructed through the offline feature engineering to serve as the basis of model training.
The off-line characteristic engineering is to extract the characteristic value of the original characteristic of the off-line data after data preprocessing, and divide the extracted characteristic value into a training set and a verification set; the original characteristics comprise vehicle speed, engine speed and energy consumption; the off-line data is original data temporarily stored for more than three days; further, the original characteristics further include a reciprocal of a transmission ratio (gear), an ambient temperature, an atmospheric humidity, an atmospheric pressure, an engine oil temperature, an engine water temperature, an actual total torque percentage, a friction torque percentage, an engine net output torque, an actual torque percentage, a fan speed, an energy consumption; the characteristic value is easier to model training by extracting the characteristic value of the original characteristic of the off-line data, and the extracted characteristic value is divided into a training set and a verification set for cross verification.
The training set is constructed for the purpose of training a vehicle which can detect that the energy consumption exceeds a normal level under the working condition similar to the extracted characteristic value; because a more direct evaluation standard for the energy consumption level is lacked, the method adopts a cascade model method, selects vehicle data with normal energy consumption level as a training set, and constructs the incidence relation between the energy consumption and the vehicle running state and working condition; preferably, the cascade model is an XGBoost algorithm, which is a cascade model of a decision tree.
The method for extracting and processing the characteristic values is shown in table 1, wherein the most important variables related to the energy consumption level are the engine speed and the vehicle speed, the engine speed and the vehicle speed are indispensable characteristic values, the energy consumption is also indispensable as a supervised learning label, the characteristic values of other original characteristics also have certain influence on the energy consumption level, the method can be adjusted according to the actual situation of original data acquired by a terminal, and the data of the energy consumption needs to be screened based on the engine speeds of different standards.
The characteristic value extraction processing of the energy consumption comprises screening of energy consumption data, and the screening of the energy consumption data specifically comprises the following steps:
1) aggregating the minute-level data formed after the data preprocessing, and paying attention to the average value of the rotating speed of each vehicle in each minute;
2) partitioning the average value of the engine speed; preferably, the average range of the rotating speed of the engine is between 0 and 2000r/min, and the average range of the rotating speed of the engine is divided into 200 small intervals by taking each 10 rotating speed values as a small interval;
3) and drawing a relation graph of the engine rotating speed and the energy consumption, calculating a 90 quantile to obtain a 90 quantile point in each engine rotating speed interval, and taking the energy consumption below the 90 quantile point as data of a normal energy consumption level for training a regression model of the energy consumption.
Through the processing, the training set entering the model training can be ensured to be the data with normal energy consumption level to the maximum extent, and errors caused by introducing abnormal data when the cascade model is established are avoided; the training set after offline feature engineering is used as training data for constructing an XGboost model, wherein energy consumption is a target variable, the rest variables are feature variables describing vehicle operation conditions and overall conditions, and the association relationship between the energy consumption and the feature variables describing the vehicle operation conditions and the overall conditions under a normal energy consumption level is constructed, wherein the feature variables describing the vehicle operation conditions and the overall conditions are feature value variables of original features such as engine speed, vehicle speed and the like; because missing values are generated when second-level data are aggregated to minute-level data, which is possibly unfavorable for simulation training and needs to be filtered before model training, data below 90 quantiles are selected as data with normal energy consumption level; the scope of the verification set is a data set (used for comparing the effects of the cascade model) which is not covered by the training set in the characteristic value data obtained after the off-line characteristic engineering.
The XGboost model generated through model training specifically comprises the following steps: removing missing values in all feature sets, taking the processed eighty percent feature set as a training set of the model, wherein the feature set is a set of feature values extracted and processed in an off-line feature engineering, and ensuring that no missing value is included before the model is trained; the energy consumption is used as a label, the characteristic value of the original characteristic except the energy consumption obtained after off-line characteristic engineering is used as a working condition characteristic, and a supervised learning method is adopted to construct a regression model by using an XGboost algorithm to fit the relevance between the energy consumption and the working condition characteristic because the label information is clear; and setting the hyper-parameter of the XGboost algorithm as a default value.
The monitoring indexes are specifically as follows: in the model training process, a cross validation method is adopted, relevant parameters of a validation set need to be recorded, the relevant parameters of the validation set are used as monitoring indexes, and the relevant parameters of the validation set are obtained through the following steps:
1) substituting the verification set into an XGboost model, and obtaining a simulation value after fitting through the XGboost model;
2) summing the offline data within minutes to obtain real energy consumption;
3) using formulasThe mean square error of the validation set is calculated, wherein,is the mean-square error of the signal,is the true value of the,is an analog value;
4) using formulasAveraging the absolute error of the validation set, whereinIs the average of the absolute errors that are,is the true value of the,is an analog value;
5) using formulasDetermining a decision coefficient of the verification set, whereinIn order to determine the coefficients, the coefficients are,is the true value of the,is a value that is an analog value,is the mean of all true values.
By the monitoring indexes, whether the XGboost model has problems or not can be monitored, and the accuracy of the XGboost model directly influences the result of a prediction stage.
The main objective of the visual display of the training result is to more visually observe the reliability of the XGboost model obtained by simulation training, the performance of the XGboost model on a verification set can be used as a standard for judging the effect of the model, and the result of the corresponding report display training is constructed on a visual platform by means of data dotting in the model training process; the XGboost model not only needs to pass the verification of a verification set, but also needs to substitute real-time data into the XGboost model for at least three times, and the reliability of the model is verified through monitoring indexes obtained through observation.
The real-time data processing adopts streaming processing, and the real-time data cleaning and preprocessing are prepared for real-time characteristic engineering.
The real-time characteristic engineering specifically comprises the following steps: acquiring a minute-level characteristic value of real-time data for real-time prediction; the real-time prediction specifically comprises the following steps: and substituting the combined real-time data into the XGboost model subjected to cross validation in the training stage for prediction to obtain a predicted value, and comparing the deviation between the predicted value and the true value.
The visual display of the predicted value specifically comprises the following steps: and (3) dotting the predicted value and the real value on a rectangular coordinate system by taking time as a horizontal axis and taking the predicted value and the real value as a vertical axis, and displaying on a visual platform through a chart to take a point with the real value 1.2 times higher than the predicted value as an abnormal point.
The abnormity monitoring alarm is used for avoiding the judgment of results influenced by errors of data points of a single minute level, so statistics is carried out according to a daily period, when the data exceeding 20% is abnormal, the energy consumption level of a vehicle is considered to be deviated, and the alarm for the vehicle with abnormal energy consumption can be realized.
The polymerization mode when converting the time stamp into the corresponding minute scale is as follows table 1:
the maximum value, the minimum value, the average value, the median and the summation in table 1 are respectively to carry out maximum value taking, minimum value taking, average value taking, median taking and summation on the data of each processing characteristic in the same minute.
The overall implementation flow of the method is as shown in figure 1, original message data acquired through a terminal are analyzed to obtain original data, then preliminary data cleaning is carried out, the acquired data are different according to different terminals, all measuring points used in the method are national standard data, the data are put into a temporary storage module for temporary storage, then offline data stored to the local are used for analysis modeling, an XGboost model obtained through model training is used for carrying out real-time prediction on the processed real-time data, and real values formed by energy consumption in predicted values and characteristic values are compared to find energy consumption abnormal equipment; the temporary storage module is mainly used for temporarily storing the acquired data, so that data loss is avoided, and offline and real-time processing can be conveniently carried out at the same time; the model training integrally adopts a cascade form, data preprocessing is carried out firstly, then the correlation between the energy consumption and the characteristic value under the normal energy consumption level is constructed by using an XGboost algorithm to obtain a predicted value, and whether the vehicle is at the abnormal energy consumption level or not is judged according to the difference value between the real energy consumption represented by the minute-level energy consumption obtained by the real-time characteristic engineering and the predicted value.
Some terms in the present invention are defined as follows:
energy consumption is abnormal: in particular to the fact that the energy consumption is higher than the normal level when the vehicle runs; and (3) supervision and learning: methods of constructing models using tagged data may be used to construct classification or regression models; offline data: storing the acquired data, and preferably, taking the data stored for more than three days as offline data; real-time data: data collected on line and having real-time performance; a cascade model: a common method of model integration is characterized in that the model of the next stage uses the output of the model of the previous stage, which may be the output result of the model or the feature after secondary processing.
Claims (10)
1. A vehicle energy consumption evaluation method based on Internet of vehicles big data is characterized by comprising a data acquisition stage, a training stage and a prediction stage; the data acquisition phase comprises: step 1-1) acquiring data to obtain original data, step 1-2) cleaning the data, and step 1-3) temporarily storing the data; the training phase comprises: step 2-1) data preprocessing, step 2-2) off-line feature engineering, step 2-3) model training is carried out to generate an XGboost model, step 2-4) indexes are monitored, and step 2-5) training results are displayed in a visual mode; the prediction phase comprises: step 3-1) real-time data processing, step 3-2) real-time feature engineering, step 3-3) real-time prediction, step 3-4) visual display of a predicted value, and step 3-5) abnormal monitoring and alarming; the training phase is the basis of the prediction phase, and the prediction phase carries out prediction on the basis of the XGboost model generated in the training phase.
2. The vehicle energy consumption evaluation method based on the internet of vehicles big data as claimed in claim 1, wherein the data acquisition to obtain the raw data specifically comprises: the method comprises the steps that original message data collected by a vehicle terminal are transmitted and collected in real time, and the original data are obtained through batch analysis; the data cleaning specifically comprises: unreasonable data and null data in original data acquired by data acquisition are removed; the temporary data storage specifically includes: performing data temporary storage on the original data subjected to data cleaning by adopting a temporary storage module; and temporarily storing the original data for more than three days as offline data.
3. The vehicle energy consumption evaluation method based on the internet of vehicles big data as claimed in claim 2, wherein the data preprocessing specifically comprises the following processes:
1) the continuity of the off-line data is marked in a slicing mode, the off-line data with the time interval larger than 20s is marked as 1 for two continuous off-line data, and other data in the off-line data are marked as 0;
2) after the offline data are marked in a slicing mode, data collected by a vehicle terminal with a very small data amount are removed; the data volume calculates the total sampling time of each vehicle terminal in one day according to the sampling frequency of the specific vehicle terminal, and for the vehicle terminals of which the span of the total sampling time in one day is not more than half an hour, the data collected by the vehicle terminals are regarded as the data collected by the vehicle terminals with extremely small data volume;
3) eliminating abnormal data; the abnormal data is one or two or three of the three conditions that the vehicle speed is more than or equal to 200km/h, the engine rotating speed is negative and the energy consumption is negative;
4) according to the time for acquiring and uploading the original message data by the vehicle terminal, the time mark is accurate to minutes and converted into a corresponding minute level;
5) and eliminating the data of the minute corresponding to the terminal with the slicing mark containing 1.
4. The vehicle energy consumption evaluation method based on the internet of vehicles big data as claimed in claim 2, wherein the off-line feature engineering is to extract feature values of original features from off-line data after data preprocessing, and divide the extracted feature values into a training set and a verification set; the raw characteristics include vehicle speed, engine speed, energy consumption.
5. The vehicle energy consumption evaluation method based on the internet of vehicles big data as claimed in claim 4, wherein the characteristic value extraction processing of the energy consumption comprises screening the energy consumption data, and the screening of the energy consumption data specifically comprises the following steps:
1) aggregating the minute-level data formed after the data preprocessing, and paying attention to the average value of the rotating speed of each vehicle in each minute;
2) partitioning the average value of the engine speed;
3) and drawing a relation graph of the engine rotating speed and the energy consumption, calculating a 90 quantile to obtain a 90 quantile point in each engine rotating speed interval, and taking the energy consumption below the 90 quantile point as data of a normal energy consumption level for training a regression model of the energy consumption.
6. The vehicle energy consumption evaluation method based on the internet of vehicles big data according to claim 4, wherein the model training for generating the XGboost model specifically comprises: using the training set for model training, and eliminating missing values in the training set to ensure that no missing value is included before model training; the energy consumption is used as a label, the characteristic value of the original characteristic except the energy consumption obtained after off-line characteristic engineering is used as a working condition characteristic, a supervised learning method is adopted, and an XGboost algorithm is used for constructing a regression model to fit the relevance between the energy consumption and the working condition characteristic; the hyper-parameter of the XGboost algorithm is set to a default value.
7. The vehicle energy consumption evaluation method based on the internet of vehicles big data according to claim 4, wherein the monitoring index is specifically as follows: in the model training process, a cross validation method is adopted, relevant parameters of a validation set are recorded, the relevant parameters of the validation set are used as monitoring indexes, and the relevant parameters of the validation set are obtained through the following steps:
1) substituting the verification set into an XGboost model, and obtaining a simulation value after fitting through the XGboost model;
2) summing the offline data within minutes to obtain real energy consumption;
3) using formulasThe mean square error of the validation set is calculated, wherein,is the mean-square error of the signal,is the true value of the,is an analog value;
4) using formulasAveraging the absolute error of the validation set, whereinIs the average of the absolute errors that are,is the true value of the,is an analog value;
8. The vehicle energy consumption evaluation method based on the internet of vehicles big data as claimed in claim 1, wherein the real-time data processing adopts streaming processing, and the real-time data cleaning and preprocessing is prepared for real-time feature engineering.
9. The vehicle energy consumption evaluation method based on the internet of vehicles big data according to claim 7, wherein the real-time feature engineering specifically comprises: acquiring a minute-level characteristic value of real-time data for real-time prediction; the real-time prediction specifically comprises the following steps: and substituting the real-time data into the XGboost model subjected to cross validation in the training stage for prediction to obtain a predicted value, and comparing the deviation between the predicted value and the true value.
10. The vehicle energy consumption evaluation method based on the internet of vehicles big data according to claim 9, wherein the visual display of the predicted value is specifically as follows: the predicted value and the real value are dotted on a rectangular coordinate system by taking time as a horizontal axis and taking the predicted value and the real value as a vertical axis, and a point with the real value 1.2 times higher than the predicted value is taken as an abnormal point through chart display on a visual platform; the abnormity monitoring alarm is counted according to a daily period, when more than 20% of data is abnormal, the energy consumption level of the vehicle is considered to be deviated, and the vehicle alarm with abnormal energy consumption is realized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110248864.5A CN112633781B (en) | 2021-03-08 | 2021-03-08 | Vehicle energy consumption evaluation method based on Internet of vehicles big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110248864.5A CN112633781B (en) | 2021-03-08 | 2021-03-08 | Vehicle energy consumption evaluation method based on Internet of vehicles big data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112633781A true CN112633781A (en) | 2021-04-09 |
CN112633781B CN112633781B (en) | 2021-06-08 |
Family
ID=75297621
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110248864.5A Active CN112633781B (en) | 2021-03-08 | 2021-03-08 | Vehicle energy consumption evaluation method based on Internet of vehicles big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112633781B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113741196A (en) * | 2021-09-14 | 2021-12-03 | 江苏海平面数据科技有限公司 | DPF regeneration period control optimization method based on Internet of vehicles big data |
CN114722102A (en) * | 2022-04-24 | 2022-07-08 | 武汉北曦盛科技有限公司 | Intelligent monitoring and management system for rail transit energy consumption system based on big data analysis |
CN117389791A (en) * | 2023-12-13 | 2024-01-12 | 江苏海平面数据科技有限公司 | Abnormal energy consumption attribution method for diesel vehicle |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109558988A (en) * | 2018-12-13 | 2019-04-02 | 北京理工新源信息科技有限公司 | A kind of electric car energy consumption prediction technique and system based on big data fusion |
CN111275288A (en) * | 2019-12-31 | 2020-06-12 | 华电国际电力股份有限公司十里泉发电厂 | XGboost-based multi-dimensional data anomaly detection method and device |
CN111723944A (en) * | 2020-05-29 | 2020-09-29 | 北京熙诚紫光科技有限公司 | CHF prediction method and device based on multiple machine learning |
CN111832101A (en) * | 2020-06-18 | 2020-10-27 | 湖北博华自动化系统工程有限公司 | Construction method of cement strength prediction model and cement strength prediction method |
CN112200932A (en) * | 2020-09-03 | 2021-01-08 | 北京蜂云科创信息技术有限公司 | Method and equipment for evaluating energy consumption of heavy-duty diesel vehicle |
-
2021
- 2021-03-08 CN CN202110248864.5A patent/CN112633781B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109558988A (en) * | 2018-12-13 | 2019-04-02 | 北京理工新源信息科技有限公司 | A kind of electric car energy consumption prediction technique and system based on big data fusion |
CN111275288A (en) * | 2019-12-31 | 2020-06-12 | 华电国际电力股份有限公司十里泉发电厂 | XGboost-based multi-dimensional data anomaly detection method and device |
CN111723944A (en) * | 2020-05-29 | 2020-09-29 | 北京熙诚紫光科技有限公司 | CHF prediction method and device based on multiple machine learning |
CN111832101A (en) * | 2020-06-18 | 2020-10-27 | 湖北博华自动化系统工程有限公司 | Construction method of cement strength prediction model and cement strength prediction method |
CN112200932A (en) * | 2020-09-03 | 2021-01-08 | 北京蜂云科创信息技术有限公司 | Method and equipment for evaluating energy consumption of heavy-duty diesel vehicle |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113741196A (en) * | 2021-09-14 | 2021-12-03 | 江苏海平面数据科技有限公司 | DPF regeneration period control optimization method based on Internet of vehicles big data |
CN114722102A (en) * | 2022-04-24 | 2022-07-08 | 武汉北曦盛科技有限公司 | Intelligent monitoring and management system for rail transit energy consumption system based on big data analysis |
CN117389791A (en) * | 2023-12-13 | 2024-01-12 | 江苏海平面数据科技有限公司 | Abnormal energy consumption attribution method for diesel vehicle |
CN117389791B (en) * | 2023-12-13 | 2024-02-23 | 江苏海平面数据科技有限公司 | Abnormal energy consumption attribution method for diesel vehicle |
Also Published As
Publication number | Publication date |
---|---|
CN112633781B (en) | 2021-06-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112633781B (en) | Vehicle energy consumption evaluation method based on Internet of vehicles big data | |
CN108038553B (en) | Rolling mill equipment state on-line monitoring and diagnosing system and monitoring and diagnosing method | |
CN111539553B (en) | Wind turbine generator fault early warning method based on SVR algorithm and off-peak degree | |
CN112505549A (en) | New energy automobile battery abnormity detection method based on isolated forest algorithm | |
CN109324604A (en) | A kind of intelligent train resultant fault analysis method based on source signal | |
CN112801555B (en) | Vehicle dynamic property comprehensive evaluation method based on Internet of vehicles big data | |
CN110311709B (en) | Fault judgment method for electricity consumption information acquisition system | |
CN113032454A (en) | Interactive user power consumption abnormity monitoring and early warning management cloud platform based on cloud computing | |
CN112883075B (en) | Landslide universal type ground surface displacement monitoring data missing and outlier processing method | |
CN115409131B (en) | Production line abnormity detection method based on SPC process control system | |
CN116466241B (en) | Thermal runaway positioning method for single battery | |
CN111027193A (en) | Short-term water level prediction method based on regression model | |
CN114201374A (en) | Operation and maintenance time sequence data anomaly detection method and system based on hybrid machine learning | |
CN115614292B (en) | Vibration monitoring device and method for vertical water pump unit | |
CN112001511A (en) | Equipment reliability and dynamic risk evaluation method, system and equipment based on data mining | |
CN117057644A (en) | Equipment production quality detection method and system based on characteristic matching | |
CN112926656A (en) | Method, system and equipment for predicting state of circulating water pump of nuclear power plant | |
CN112016193B (en) | Online prediction method and system for lubrication failure of shield tunneling machine system | |
CN115165326A (en) | Fan fault diagnosis method through mechanical transmission chain lubricating oil (grease) impurity analysis | |
CN113313365A (en) | Degradation early warning method and device for primary air fan | |
CN109872511B (en) | Self-adaptive two-stage alarm method for monitoring axial displacement sudden change | |
CN115186007A (en) | Airborne data identification real-time display method and system for monitoring and reminding | |
CN116224950A (en) | Intelligent fault diagnosis method and system for self-organizing reconstruction of unmanned production line | |
CN114412447A (en) | Fault detection method and device for screw pump well | |
CN112486096A (en) | Machine tool operation state monitoring method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |