CN112633781A - Vehicle energy consumption evaluation method based on Internet of vehicles big data - Google Patents

Vehicle energy consumption evaluation method based on Internet of vehicles big data Download PDF

Info

Publication number
CN112633781A
CN112633781A CN202110248864.5A CN202110248864A CN112633781A CN 112633781 A CN112633781 A CN 112633781A CN 202110248864 A CN202110248864 A CN 202110248864A CN 112633781 A CN112633781 A CN 112633781A
Authority
CN
China
Prior art keywords
data
energy consumption
vehicle
real
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110248864.5A
Other languages
Chinese (zh)
Other versions
CN112633781B (en
Inventor
王欣然
陈智也
孙杰
沈祥红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Sea Level Data Technology Co ltd
Original Assignee
Jiangsu Sea Level Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Sea Level Data Technology Co ltd filed Critical Jiangsu Sea Level Data Technology Co ltd
Priority to CN202110248864.5A priority Critical patent/CN112633781B/en
Publication of CN112633781A publication Critical patent/CN112633781A/en
Application granted granted Critical
Publication of CN112633781B publication Critical patent/CN112633781B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The invention provides a vehicle energy consumption evaluation method based on Internet of vehicles big data, which comprises a data acquisition stage, a training stage and a prediction stage; the data acquisition phase comprises: acquiring data to obtain original data, cleaning the data, and temporarily storing the data; the training phase comprises: data preprocessing, offline characteristic engineering, model training to generate an XGboost model, monitoring indexes and visually displaying training results; the prediction phase comprises: real-time data processing, real-time characteristic engineering, real-time prediction, visual display of predicted values and abnormal monitoring alarm; the training phase is the basis of the prediction phase, and the prediction phase carries out prediction on the basis of the XGboost model generated in the training phase. The invention can monitor the energy consumption level of the vehicle in real time, and provides a reliable basis for troubleshooting of the vehicle based on a perfect early warning system.

Description

Vehicle energy consumption evaluation method based on Internet of vehicles big data
Technical Field
The invention relates to a vehicle energy consumption evaluation method based on Internet of vehicles big data, and belongs to the field of vehicle energy consumption evaluation.
Background
With the continuous development of scientific technology, the aspects related to the engine are more and more, people also research various types of engines with different purposes in an effort mode, and people pay more and more attention to the economical efficiency of vehicles; in the conventional automobile industry, energy consumption analysis technology is also available, but most of the energy consumption analysis technology is based on single vehicles or single-brand vehicles for sampling analysis and evaluation, and the vehicles are required to be driven to a specific site for testing.
In recent years, the development scale of big data in various industries is rapidly enlarged and developed rapidly, data generated by industrial application is explosively increased, the Internet of things is developed along with the development of the big data, and the big data of the Internet of vehicles is also initially large in scale, but the application of combining the big data with machine learning to finish vehicle energy consumption evaluation is lacked at present; the energy consumption level of a vehicle is defined by sampling and evaluating a small number of vehicles, and although a certain problem can be reflected, the energy consumption level of the vehicle is limited, and the limitation condition is harsh, so that the energy consumption level of the vehicle cannot be applied on a large scale.
The traditional vehicle energy consumption evaluation method needs a large amount of offline sampling detection to increase the universality and accuracy of results, but cannot monitor whether the energy consumption of each vehicle is abnormal in real time; on the basis of a big data technology, the problem that the vehicle energy consumption evaluation cannot be completed due to numerous energy consumption factors and lack of a complete scientific data processing and analyzing method can also be solved, and if the energy consumption level of one vehicle is required to be detected, the energy consumption level can be detected only by a repair shop or a detection mechanism; the big data real-time processing technology and machine learning are combined, the energy consumption level of each vehicle can be monitored in real time, and the defects of time consumption and labor consumption of the traditional detection method are perfectly overcome.
Disclosure of Invention
The invention provides a vehicle energy consumption evaluation method based on Internet of vehicles big data, and aims to solve the problem that the vehicle energy consumption level cannot be monitored in real time in the prior art.
The technical solution of the invention is as follows: a vehicle energy consumption evaluation method based on Internet of vehicles big data comprises a data acquisition stage, a training stage and a prediction stage; the data acquisition phase comprises: step 1-1) acquiring data to obtain original data, step 1-2) cleaning the data, and step 1-3) temporarily storing the data; the training phase comprises: step 2-1) data preprocessing, step 2-2) off-line feature engineering, step 2-3) model training is carried out to generate an XGboost model, step 2-4) indexes are monitored, and step 2-5) training results are displayed in a visual mode; the prediction phase comprises: step 3-1) real-time data processing, step 3-2) real-time feature engineering, step 3-3) real-time prediction, step 3-4) visual display of a predicted value, and step 3-5) abnormal monitoring and alarming; the training phase is the basis of the prediction phase, and the prediction phase carries out prediction on the basis of the XGboost model generated in the training phase.
Further, the acquiring of the raw data by the data acquisition specifically includes: the method comprises the steps of carrying out real-time transmission and collection on original message data collected by a vehicle terminal, and analyzing the collected original message data in batches to obtain the original data.
Further, the data cleansing specifically includes: unreasonable data and null data in the original data obtained by data acquisition are removed.
Further, the temporarily storing the data specifically includes: performing data temporary storage on the original data subjected to data cleaning by adopting a temporary storage module; and temporarily storing the original data for more than three days as offline data.
Further, the data preprocessing specifically includes the following processes:
1) the continuity of the off-line data is marked in a slicing mode, the off-line data with the time interval larger than 20s is marked as 1 for two continuous off-line data, and other data in the off-line data are marked as 0, so that the selective deviation caused by the subsequent screening of continuous working conditions is avoided;
2) after the offline data are marked in a slicing mode, data collected by a vehicle terminal with a very small data amount are removed; the data volume calculates the total sampling time of each vehicle terminal in one day according to the sampling frequency of the specific vehicle terminal, and for the vehicle terminals of which the span of the total sampling time in one day is not more than half an hour, the data collected by the vehicle terminals are regarded as the data collected by the vehicle terminals with extremely small data volume;
3) eliminating abnormal data; the abnormal data is one or two or three of the three conditions that the vehicle speed is more than or equal to 200km/h, the engine rotating speed is negative and the energy consumption is negative;
4) according to the time of collecting and uploading the original message data by the vehicle terminal, the time mark is accurate to minutes (before conversion: 12:10:50, 12:10: 20; 12:10) after conversion, and converting into corresponding minute grade;
5) rejecting data of the corresponding terminal for the minute, wherein the fragmentation mark comprises 1; for the data of which the fragmentation mark comprises 1 after aggregation, the data continuity in the minute is poor, the difference from the conventional working condition is large, and the reference value is not available.
Further, the offline feature engineering is to perform feature value extraction processing of original features on offline data subjected to data preprocessing, and divide feature values subjected to extraction processing into a training set and a verification set; the original characteristics comprise vehicle speed, engine speed and energy consumption; the off-line data is original data temporarily stored for more than three days; and extracting the characteristic values of the original characteristics of the off-line data, and dividing the extracted characteristic values into a training set and a verification set.
Further, the characteristic value extraction processing of the energy consumption includes screening the data of the energy consumption, and the screening of the data of the energy consumption specifically includes the following steps:
1) aggregating the minute-level data formed after the data preprocessing, and paying attention to the average value of the rotating speed of each vehicle in each minute;
2) partitioning the average value of the engine speed; preferably, the average range of the rotating speed of the engine is between 0 and 2000r/min, and the average range of the rotating speed of the engine is divided into 200 small intervals by taking each 10 rotating speed values as a small interval;
3) and drawing a relation graph of the engine rotating speed and the energy consumption, calculating a 90 quantile to obtain a 90 quantile point in each engine rotating speed interval, and taking the energy consumption below the 90 quantile point as data of a normal energy consumption level for training a regression model of the energy consumption.
Further, the generating of the XGBoost model through model training specifically includes: using the training set for model training, and eliminating missing values in the training set to ensure that no missing value is included before model training; the energy consumption is used as a label, the characteristic value of the original characteristic except the energy consumption obtained after off-line characteristic engineering is used as a working condition characteristic, and a supervised learning method is adopted to construct a regression model by using an XGboost algorithm to fit the relevance between the energy consumption and the working condition characteristic because the label information is clear; the hyper-parameter of the XGboost algorithm is set to a default value.
Further, the monitoring index is specifically: in the model training process, a cross validation method is adopted, relevant parameters of a validation set need to be recorded, the relevant parameters of the validation set are used as monitoring indexes, and the relevant parameters of the validation set are obtained through the following steps:
1) substituting the verification set into an XGboost model, and obtaining a simulation value after fitting through the XGboost model;
2) summing the offline data within minutes to obtain real energy consumption;
3) using formulas
Figure 474313DEST_PATH_IMAGE002
The mean square error of the validation set is calculated, wherein,
Figure 849931DEST_PATH_IMAGE004
is the mean-square error of the signal,
Figure 801706DEST_PATH_IMAGE006
is the true value of the,
Figure 680800DEST_PATH_IMAGE008
is an analog value;
4) using formulas
Figure 997512DEST_PATH_IMAGE010
Averaging the absolute error of the validation set, wherein
Figure 453901DEST_PATH_IMAGE012
Is the average of the absolute errors that are,
Figure 84734DEST_PATH_IMAGE006
is the true value of the,
Figure 677389DEST_PATH_IMAGE008
is an analog value;
5) using formulas
Figure 368265DEST_PATH_IMAGE014
Determining a decision coefficient of the verification set, wherein
Figure 46371DEST_PATH_IMAGE016
In order to determine the coefficients, the coefficients are,
Figure 483824DEST_PATH_IMAGE017
is the true value of the,
Figure 603090DEST_PATH_IMAGE008
is a value that is an analog value,
Figure 323921DEST_PATH_IMAGE019
is the mean of all true values.
Furthermore, the real-time data processing adopts streaming processing, and the data is cleaned and preprocessed in real time to prepare for real-time characteristic engineering.
Further, the real-time feature engineering specifically includes: acquiring a minute-level characteristic value of real-time data for real-time prediction; the real-time prediction specifically comprises the following steps: and substituting the combined real-time data into the XGboost model subjected to cross validation in the training stage for prediction to obtain a predicted value, and comparing the deviation between the predicted value and the true value.
Further, the visual display of the predicted value specifically includes: and (3) dotting the predicted value and the real value on a rectangular coordinate system by taking time as a horizontal axis and taking the predicted value and the real value as a vertical axis, and displaying on a visual platform through a chart to take a point with the real value 1.2 times higher than the predicted value as an abnormal point.
The abnormity monitoring alarm is used for avoiding the judgment of results influenced by errors of data points of a single minute level, so statistics is carried out according to a daily period, when the data exceeding 20% is abnormal, the energy consumption level of a vehicle is considered to be deviated, and the alarm for the vehicle with abnormal energy consumption can be realized.
The invention has the advantages that:
1) the invention realizes the energy consumption evaluation work of the vehicle under the current big data scene, can monitor the energy consumption level of the vehicle in real time, provides a reliable basis for troubleshooting of the vehicle based on a perfect early warning system, avoids energy waste caused by unknown abnormal energy consumption to a certain extent, and reduces the risk of accidents caused by failure in time to find out the engine failure;
2) based on the big data of the Internet of vehicles and the machine learning technology, a reasonable analysis model can be constructed on the basis of mass data collection, storage and calculation, and the two models supplement each other, so that the accuracy of the analysis result of the model is further improved, and the detection cost is reduced.
Drawings
FIG. 1 is a flow chart of the overall implementation of the present invention.
Fig. 2 is a detailed flow of the training phase.
FIG. 3 is a flow chart of an embodiment of the prediction phase.
Detailed Description
The present invention will be described in detail with reference to the following embodiments.
A vehicle energy consumption evaluation method based on Internet of vehicles big data comprises a data acquisition stage, a training stage and a prediction stage; the data acquisition phase comprises: step 1-1) acquiring data to obtain original data, step 1-2) cleaning the data, and step 1-3) temporarily storing the data; the training phase comprises: step 2-1) data preprocessing, step 2-2) off-line feature engineering, step 2-3) model training is carried out to generate an XGboost model, step 2-4) indexes are monitored, and step 2-5) training results are displayed in a visual mode; the prediction phase comprises: step 3-1) real-time data processing, step 3-2) real-time feature engineering, step 3-3) real-time prediction, step 3-4) visual display of a predicted value, and step 3-5) abnormal monitoring and alarming; the training phase is the basis of the prediction phase, and the prediction phase carries out prediction on the basis of the XGboost model generated in the training phase.
The data acquisition method specifically comprises the following steps of: the method comprises the steps that original message data collected by a vehicle terminal are transmitted and collected in real time, and the original data are obtained through batch analysis; analysis standards adopted when different batches of original message data are analyzed in batches are different, and unreasonable data and null data exist, so that data cleaning is needed.
The data cleaning specifically comprises the following steps: unreasonable data and null data in original data obtained by data acquisition are removed, corresponding problem data are extracted and named uniformly; the unreasonable data is data beyond the conventional theory, for example, the vehicle speed is more than or equal to 200 km/h; the null data is data of which values are not acquired, dirty data and a small amount of data are lost due to reasons such as network transmission abnormity or terminal abnormity, and corresponding problem data need to be extracted and named uniformly due to different names of the same data generated by the change of an analysis standard.
The temporary data storage specifically includes: performing data temporary storage on the original data subjected to data cleaning by adopting a temporary storage module; because offline data are needed in the training stage, and real-time data are needed to be predicted in the prediction stage, the temporary storage module is adopted to temporarily store the data of the original data after data cleaning, and the effects of preventing data loss and automatically adjusting the data pulling speed according to the processing capacity are achieved.
The data preprocessing specifically comprises the following processes:
1) the continuity of the off-line data is marked in a slicing mode, the off-line data with the time interval larger than 30s is marked as 1 for two continuous off-line data, and other data in the off-line data are marked as 0, so that the selective deviation caused by the subsequent screening of continuous working conditions is avoided;
2) after the offline data are marked in a slicing mode, data collected by a vehicle terminal with a very small data amount are removed; the data volume calculates the total sampling time of each vehicle terminal in one day according to the sampling frequency of the specific vehicle terminal, and for the vehicle terminals of which the span of the total sampling time in one day is not more than half an hour, the data collected by the vehicle terminals are regarded as the data collected by the vehicle terminals with extremely small data volume; the distribution of data acquired by a vehicle terminal with a small data volume is greatly different, so that the model training process of the XGboost model is interfered;
3) eliminating abnormal data; the abnormal data is one or two or three of the three conditions that the vehicle speed is more than or equal to 200km/h, the engine rotating speed is negative and the energy consumption is negative;
4) according to the time of collecting and uploading original message data by the vehicle terminal, the time mark neglects seconds and is accurate to the minute level (before conversion: 12:10:50, 12:10: 20; 12:10) after conversion, and is used for subsequent operation; aggregation is needed to be carried out on offline data within the same minute when XGboost model modeling is carried out subsequently, and the aggregation mode is shown in table 1;
5) rejecting data of the corresponding terminal for the minute, wherein the fragmentation mark comprises 1; for the data of which the fragmentation mark comprises 1 after aggregation, the data continuity in the minute is poor, the difference from the conventional working condition is large, and the reference value is not available.
The data preprocessing is caused by the reasons of personal behavior of a driver, abnormal state of a terminal, network transmission and the like, the integrity of data acquired by partial vehicle terminals is poor, and the condition of discontinuous data is very common; therefore, before modeling is carried out by model training, data preprocessing needs to be carried out on offline data so as to be used for building of offline feature engineering and modeling of model training.
Through the data preprocessing process, the data basically meet the requirements of offline feature engineering, and then a training set and a verification set are constructed through the offline feature engineering to serve as the basis of model training.
The off-line characteristic engineering is to extract the characteristic value of the original characteristic of the off-line data after data preprocessing, and divide the extracted characteristic value into a training set and a verification set; the original characteristics comprise vehicle speed, engine speed and energy consumption; the off-line data is original data temporarily stored for more than three days; further, the original characteristics further include a reciprocal of a transmission ratio (gear), an ambient temperature, an atmospheric humidity, an atmospheric pressure, an engine oil temperature, an engine water temperature, an actual total torque percentage, a friction torque percentage, an engine net output torque, an actual torque percentage, a fan speed, an energy consumption; the characteristic value is easier to model training by extracting the characteristic value of the original characteristic of the off-line data, and the extracted characteristic value is divided into a training set and a verification set for cross verification.
The training set is constructed for the purpose of training a vehicle which can detect that the energy consumption exceeds a normal level under the working condition similar to the extracted characteristic value; because a more direct evaluation standard for the energy consumption level is lacked, the method adopts a cascade model method, selects vehicle data with normal energy consumption level as a training set, and constructs the incidence relation between the energy consumption and the vehicle running state and working condition; preferably, the cascade model is an XGBoost algorithm, which is a cascade model of a decision tree.
The method for extracting and processing the characteristic values is shown in table 1, wherein the most important variables related to the energy consumption level are the engine speed and the vehicle speed, the engine speed and the vehicle speed are indispensable characteristic values, the energy consumption is also indispensable as a supervised learning label, the characteristic values of other original characteristics also have certain influence on the energy consumption level, the method can be adjusted according to the actual situation of original data acquired by a terminal, and the data of the energy consumption needs to be screened based on the engine speeds of different standards.
The characteristic value extraction processing of the energy consumption comprises screening of energy consumption data, and the screening of the energy consumption data specifically comprises the following steps:
1) aggregating the minute-level data formed after the data preprocessing, and paying attention to the average value of the rotating speed of each vehicle in each minute;
2) partitioning the average value of the engine speed; preferably, the average range of the rotating speed of the engine is between 0 and 2000r/min, and the average range of the rotating speed of the engine is divided into 200 small intervals by taking each 10 rotating speed values as a small interval;
3) and drawing a relation graph of the engine rotating speed and the energy consumption, calculating a 90 quantile to obtain a 90 quantile point in each engine rotating speed interval, and taking the energy consumption below the 90 quantile point as data of a normal energy consumption level for training a regression model of the energy consumption.
Through the processing, the training set entering the model training can be ensured to be the data with normal energy consumption level to the maximum extent, and errors caused by introducing abnormal data when the cascade model is established are avoided; the training set after offline feature engineering is used as training data for constructing an XGboost model, wherein energy consumption is a target variable, the rest variables are feature variables describing vehicle operation conditions and overall conditions, and the association relationship between the energy consumption and the feature variables describing the vehicle operation conditions and the overall conditions under a normal energy consumption level is constructed, wherein the feature variables describing the vehicle operation conditions and the overall conditions are feature value variables of original features such as engine speed, vehicle speed and the like; because missing values are generated when second-level data are aggregated to minute-level data, which is possibly unfavorable for simulation training and needs to be filtered before model training, data below 90 quantiles are selected as data with normal energy consumption level; the scope of the verification set is a data set (used for comparing the effects of the cascade model) which is not covered by the training set in the characteristic value data obtained after the off-line characteristic engineering.
The XGboost model generated through model training specifically comprises the following steps: removing missing values in all feature sets, taking the processed eighty percent feature set as a training set of the model, wherein the feature set is a set of feature values extracted and processed in an off-line feature engineering, and ensuring that no missing value is included before the model is trained; the energy consumption is used as a label, the characteristic value of the original characteristic except the energy consumption obtained after off-line characteristic engineering is used as a working condition characteristic, and a supervised learning method is adopted to construct a regression model by using an XGboost algorithm to fit the relevance between the energy consumption and the working condition characteristic because the label information is clear; and setting the hyper-parameter of the XGboost algorithm as a default value.
The monitoring indexes are specifically as follows: in the model training process, a cross validation method is adopted, relevant parameters of a validation set need to be recorded, the relevant parameters of the validation set are used as monitoring indexes, and the relevant parameters of the validation set are obtained through the following steps:
1) substituting the verification set into an XGboost model, and obtaining a simulation value after fitting through the XGboost model;
2) summing the offline data within minutes to obtain real energy consumption;
3) using formulas
Figure 630269DEST_PATH_IMAGE002
The mean square error of the validation set is calculated, wherein,
Figure 727538DEST_PATH_IMAGE004
is the mean-square error of the signal,
Figure 232468DEST_PATH_IMAGE006
is the true value of the,
Figure 999567DEST_PATH_IMAGE008
is an analog value;
4) using formulas
Figure 917845DEST_PATH_IMAGE010
Averaging the absolute error of the validation set, wherein
Figure 694171DEST_PATH_IMAGE012
Is the average of the absolute errors that are,
Figure 115925DEST_PATH_IMAGE006
is the true value of the,
Figure 53925DEST_PATH_IMAGE008
is an analog value;
5) using formulas
Figure 866023DEST_PATH_IMAGE014
Determining a decision coefficient of the verification set, wherein
Figure 305095DEST_PATH_IMAGE016
In order to determine the coefficients, the coefficients are,
Figure 516109DEST_PATH_IMAGE017
is the true value of the,
Figure 890589DEST_PATH_IMAGE008
is a value that is an analog value,
Figure 517880DEST_PATH_IMAGE019
is the mean of all true values.
By the monitoring indexes, whether the XGboost model has problems or not can be monitored, and the accuracy of the XGboost model directly influences the result of a prediction stage.
The main objective of the visual display of the training result is to more visually observe the reliability of the XGboost model obtained by simulation training, the performance of the XGboost model on a verification set can be used as a standard for judging the effect of the model, and the result of the corresponding report display training is constructed on a visual platform by means of data dotting in the model training process; the XGboost model not only needs to pass the verification of a verification set, but also needs to substitute real-time data into the XGboost model for at least three times, and the reliability of the model is verified through monitoring indexes obtained through observation.
The real-time data processing adopts streaming processing, and the real-time data cleaning and preprocessing are prepared for real-time characteristic engineering.
The real-time characteristic engineering specifically comprises the following steps: acquiring a minute-level characteristic value of real-time data for real-time prediction; the real-time prediction specifically comprises the following steps: and substituting the combined real-time data into the XGboost model subjected to cross validation in the training stage for prediction to obtain a predicted value, and comparing the deviation between the predicted value and the true value.
The visual display of the predicted value specifically comprises the following steps: and (3) dotting the predicted value and the real value on a rectangular coordinate system by taking time as a horizontal axis and taking the predicted value and the real value as a vertical axis, and displaying on a visual platform through a chart to take a point with the real value 1.2 times higher than the predicted value as an abnormal point.
The abnormity monitoring alarm is used for avoiding the judgment of results influenced by errors of data points of a single minute level, so statistics is carried out according to a daily period, when the data exceeding 20% is abnormal, the energy consumption level of a vehicle is considered to be deviated, and the alarm for the vehicle with abnormal energy consumption can be realized.
The polymerization mode when converting the time stamp into the corresponding minute scale is as follows table 1:
Figure DEST_PATH_IMAGE020
the maximum value, the minimum value, the average value, the median and the summation in table 1 are respectively to carry out maximum value taking, minimum value taking, average value taking, median taking and summation on the data of each processing characteristic in the same minute.
The overall implementation flow of the method is as shown in figure 1, original message data acquired through a terminal are analyzed to obtain original data, then preliminary data cleaning is carried out, the acquired data are different according to different terminals, all measuring points used in the method are national standard data, the data are put into a temporary storage module for temporary storage, then offline data stored to the local are used for analysis modeling, an XGboost model obtained through model training is used for carrying out real-time prediction on the processed real-time data, and real values formed by energy consumption in predicted values and characteristic values are compared to find energy consumption abnormal equipment; the temporary storage module is mainly used for temporarily storing the acquired data, so that data loss is avoided, and offline and real-time processing can be conveniently carried out at the same time; the model training integrally adopts a cascade form, data preprocessing is carried out firstly, then the correlation between the energy consumption and the characteristic value under the normal energy consumption level is constructed by using an XGboost algorithm to obtain a predicted value, and whether the vehicle is at the abnormal energy consumption level or not is judged according to the difference value between the real energy consumption represented by the minute-level energy consumption obtained by the real-time characteristic engineering and the predicted value.
Some terms in the present invention are defined as follows:
energy consumption is abnormal: in particular to the fact that the energy consumption is higher than the normal level when the vehicle runs; and (3) supervision and learning: methods of constructing models using tagged data may be used to construct classification or regression models; offline data: storing the acquired data, and preferably, taking the data stored for more than three days as offline data; real-time data: data collected on line and having real-time performance; a cascade model: a common method of model integration is characterized in that the model of the next stage uses the output of the model of the previous stage, which may be the output result of the model or the feature after secondary processing.

Claims (10)

1. A vehicle energy consumption evaluation method based on Internet of vehicles big data is characterized by comprising a data acquisition stage, a training stage and a prediction stage; the data acquisition phase comprises: step 1-1) acquiring data to obtain original data, step 1-2) cleaning the data, and step 1-3) temporarily storing the data; the training phase comprises: step 2-1) data preprocessing, step 2-2) off-line feature engineering, step 2-3) model training is carried out to generate an XGboost model, step 2-4) indexes are monitored, and step 2-5) training results are displayed in a visual mode; the prediction phase comprises: step 3-1) real-time data processing, step 3-2) real-time feature engineering, step 3-3) real-time prediction, step 3-4) visual display of a predicted value, and step 3-5) abnormal monitoring and alarming; the training phase is the basis of the prediction phase, and the prediction phase carries out prediction on the basis of the XGboost model generated in the training phase.
2. The vehicle energy consumption evaluation method based on the internet of vehicles big data as claimed in claim 1, wherein the data acquisition to obtain the raw data specifically comprises: the method comprises the steps that original message data collected by a vehicle terminal are transmitted and collected in real time, and the original data are obtained through batch analysis; the data cleaning specifically comprises: unreasonable data and null data in original data acquired by data acquisition are removed; the temporary data storage specifically includes: performing data temporary storage on the original data subjected to data cleaning by adopting a temporary storage module; and temporarily storing the original data for more than three days as offline data.
3. The vehicle energy consumption evaluation method based on the internet of vehicles big data as claimed in claim 2, wherein the data preprocessing specifically comprises the following processes:
1) the continuity of the off-line data is marked in a slicing mode, the off-line data with the time interval larger than 20s is marked as 1 for two continuous off-line data, and other data in the off-line data are marked as 0;
2) after the offline data are marked in a slicing mode, data collected by a vehicle terminal with a very small data amount are removed; the data volume calculates the total sampling time of each vehicle terminal in one day according to the sampling frequency of the specific vehicle terminal, and for the vehicle terminals of which the span of the total sampling time in one day is not more than half an hour, the data collected by the vehicle terminals are regarded as the data collected by the vehicle terminals with extremely small data volume;
3) eliminating abnormal data; the abnormal data is one or two or three of the three conditions that the vehicle speed is more than or equal to 200km/h, the engine rotating speed is negative and the energy consumption is negative;
4) according to the time for acquiring and uploading the original message data by the vehicle terminal, the time mark is accurate to minutes and converted into a corresponding minute level;
5) and eliminating the data of the minute corresponding to the terminal with the slicing mark containing 1.
4. The vehicle energy consumption evaluation method based on the internet of vehicles big data as claimed in claim 2, wherein the off-line feature engineering is to extract feature values of original features from off-line data after data preprocessing, and divide the extracted feature values into a training set and a verification set; the raw characteristics include vehicle speed, engine speed, energy consumption.
5. The vehicle energy consumption evaluation method based on the internet of vehicles big data as claimed in claim 4, wherein the characteristic value extraction processing of the energy consumption comprises screening the energy consumption data, and the screening of the energy consumption data specifically comprises the following steps:
1) aggregating the minute-level data formed after the data preprocessing, and paying attention to the average value of the rotating speed of each vehicle in each minute;
2) partitioning the average value of the engine speed;
3) and drawing a relation graph of the engine rotating speed and the energy consumption, calculating a 90 quantile to obtain a 90 quantile point in each engine rotating speed interval, and taking the energy consumption below the 90 quantile point as data of a normal energy consumption level for training a regression model of the energy consumption.
6. The vehicle energy consumption evaluation method based on the internet of vehicles big data according to claim 4, wherein the model training for generating the XGboost model specifically comprises: using the training set for model training, and eliminating missing values in the training set to ensure that no missing value is included before model training; the energy consumption is used as a label, the characteristic value of the original characteristic except the energy consumption obtained after off-line characteristic engineering is used as a working condition characteristic, a supervised learning method is adopted, and an XGboost algorithm is used for constructing a regression model to fit the relevance between the energy consumption and the working condition characteristic; the hyper-parameter of the XGboost algorithm is set to a default value.
7. The vehicle energy consumption evaluation method based on the internet of vehicles big data according to claim 4, wherein the monitoring index is specifically as follows: in the model training process, a cross validation method is adopted, relevant parameters of a validation set are recorded, the relevant parameters of the validation set are used as monitoring indexes, and the relevant parameters of the validation set are obtained through the following steps:
1) substituting the verification set into an XGboost model, and obtaining a simulation value after fitting through the XGboost model;
2) summing the offline data within minutes to obtain real energy consumption;
3) using formulas
Figure DEST_PATH_IMAGE001
The mean square error of the validation set is calculated, wherein,
Figure DEST_PATH_IMAGE002
is the mean-square error of the signal,
Figure DEST_PATH_IMAGE003
is the true value of the,
Figure DEST_PATH_IMAGE004
is an analog value;
4) using formulas
Figure DEST_PATH_IMAGE005
Averaging the absolute error of the validation set, wherein
Figure DEST_PATH_IMAGE006
Is the average of the absolute errors that are,
Figure 64750DEST_PATH_IMAGE003
is the true value of the,
Figure 336462DEST_PATH_IMAGE004
is an analog value;
5) using formulas
Figure DEST_PATH_IMAGE007
Determining a decision coefficient of the verification set, wherein
Figure DEST_PATH_IMAGE008
In order to determine the coefficients, the coefficients are,
Figure DEST_PATH_IMAGE009
is the true value of the,
Figure 616265DEST_PATH_IMAGE004
is a value that is an analog value,
Figure DEST_PATH_IMAGE010
is the mean of all true values.
8. The vehicle energy consumption evaluation method based on the internet of vehicles big data as claimed in claim 1, wherein the real-time data processing adopts streaming processing, and the real-time data cleaning and preprocessing is prepared for real-time feature engineering.
9. The vehicle energy consumption evaluation method based on the internet of vehicles big data according to claim 7, wherein the real-time feature engineering specifically comprises: acquiring a minute-level characteristic value of real-time data for real-time prediction; the real-time prediction specifically comprises the following steps: and substituting the real-time data into the XGboost model subjected to cross validation in the training stage for prediction to obtain a predicted value, and comparing the deviation between the predicted value and the true value.
10. The vehicle energy consumption evaluation method based on the internet of vehicles big data according to claim 9, wherein the visual display of the predicted value is specifically as follows: the predicted value and the real value are dotted on a rectangular coordinate system by taking time as a horizontal axis and taking the predicted value and the real value as a vertical axis, and a point with the real value 1.2 times higher than the predicted value is taken as an abnormal point through chart display on a visual platform; the abnormity monitoring alarm is counted according to a daily period, when more than 20% of data is abnormal, the energy consumption level of the vehicle is considered to be deviated, and the vehicle alarm with abnormal energy consumption is realized.
CN202110248864.5A 2021-03-08 2021-03-08 Vehicle energy consumption evaluation method based on Internet of vehicles big data Active CN112633781B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110248864.5A CN112633781B (en) 2021-03-08 2021-03-08 Vehicle energy consumption evaluation method based on Internet of vehicles big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110248864.5A CN112633781B (en) 2021-03-08 2021-03-08 Vehicle energy consumption evaluation method based on Internet of vehicles big data

Publications (2)

Publication Number Publication Date
CN112633781A true CN112633781A (en) 2021-04-09
CN112633781B CN112633781B (en) 2021-06-08

Family

ID=75297621

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110248864.5A Active CN112633781B (en) 2021-03-08 2021-03-08 Vehicle energy consumption evaluation method based on Internet of vehicles big data

Country Status (1)

Country Link
CN (1) CN112633781B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113741196A (en) * 2021-09-14 2021-12-03 江苏海平面数据科技有限公司 DPF regeneration period control optimization method based on Internet of vehicles big data
CN114722102A (en) * 2022-04-24 2022-07-08 武汉北曦盛科技有限公司 Intelligent monitoring and management system for rail transit energy consumption system based on big data analysis
CN117389791A (en) * 2023-12-13 2024-01-12 江苏海平面数据科技有限公司 Abnormal energy consumption attribution method for diesel vehicle

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558988A (en) * 2018-12-13 2019-04-02 北京理工新源信息科技有限公司 A kind of electric car energy consumption prediction technique and system based on big data fusion
CN111275288A (en) * 2019-12-31 2020-06-12 华电国际电力股份有限公司十里泉发电厂 XGboost-based multi-dimensional data anomaly detection method and device
CN111723944A (en) * 2020-05-29 2020-09-29 北京熙诚紫光科技有限公司 CHF prediction method and device based on multiple machine learning
CN111832101A (en) * 2020-06-18 2020-10-27 湖北博华自动化系统工程有限公司 Construction method of cement strength prediction model and cement strength prediction method
CN112200932A (en) * 2020-09-03 2021-01-08 北京蜂云科创信息技术有限公司 Method and equipment for evaluating energy consumption of heavy-duty diesel vehicle

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558988A (en) * 2018-12-13 2019-04-02 北京理工新源信息科技有限公司 A kind of electric car energy consumption prediction technique and system based on big data fusion
CN111275288A (en) * 2019-12-31 2020-06-12 华电国际电力股份有限公司十里泉发电厂 XGboost-based multi-dimensional data anomaly detection method and device
CN111723944A (en) * 2020-05-29 2020-09-29 北京熙诚紫光科技有限公司 CHF prediction method and device based on multiple machine learning
CN111832101A (en) * 2020-06-18 2020-10-27 湖北博华自动化系统工程有限公司 Construction method of cement strength prediction model and cement strength prediction method
CN112200932A (en) * 2020-09-03 2021-01-08 北京蜂云科创信息技术有限公司 Method and equipment for evaluating energy consumption of heavy-duty diesel vehicle

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113741196A (en) * 2021-09-14 2021-12-03 江苏海平面数据科技有限公司 DPF regeneration period control optimization method based on Internet of vehicles big data
CN114722102A (en) * 2022-04-24 2022-07-08 武汉北曦盛科技有限公司 Intelligent monitoring and management system for rail transit energy consumption system based on big data analysis
CN117389791A (en) * 2023-12-13 2024-01-12 江苏海平面数据科技有限公司 Abnormal energy consumption attribution method for diesel vehicle
CN117389791B (en) * 2023-12-13 2024-02-23 江苏海平面数据科技有限公司 Abnormal energy consumption attribution method for diesel vehicle

Also Published As

Publication number Publication date
CN112633781B (en) 2021-06-08

Similar Documents

Publication Publication Date Title
CN112633781B (en) Vehicle energy consumption evaluation method based on Internet of vehicles big data
CN108038553B (en) Rolling mill equipment state on-line monitoring and diagnosing system and monitoring and diagnosing method
CN111539553B (en) Wind turbine generator fault early warning method based on SVR algorithm and off-peak degree
CN112505549A (en) New energy automobile battery abnormity detection method based on isolated forest algorithm
CN109324604A (en) A kind of intelligent train resultant fault analysis method based on source signal
CN112801555B (en) Vehicle dynamic property comprehensive evaluation method based on Internet of vehicles big data
CN110311709B (en) Fault judgment method for electricity consumption information acquisition system
CN113032454A (en) Interactive user power consumption abnormity monitoring and early warning management cloud platform based on cloud computing
CN112883075B (en) Landslide universal type ground surface displacement monitoring data missing and outlier processing method
CN115409131B (en) Production line abnormity detection method based on SPC process control system
CN116466241B (en) Thermal runaway positioning method for single battery
CN111027193A (en) Short-term water level prediction method based on regression model
CN114201374A (en) Operation and maintenance time sequence data anomaly detection method and system based on hybrid machine learning
CN115614292B (en) Vibration monitoring device and method for vertical water pump unit
CN112001511A (en) Equipment reliability and dynamic risk evaluation method, system and equipment based on data mining
CN117057644A (en) Equipment production quality detection method and system based on characteristic matching
CN112926656A (en) Method, system and equipment for predicting state of circulating water pump of nuclear power plant
CN112016193B (en) Online prediction method and system for lubrication failure of shield tunneling machine system
CN115165326A (en) Fan fault diagnosis method through mechanical transmission chain lubricating oil (grease) impurity analysis
CN113313365A (en) Degradation early warning method and device for primary air fan
CN109872511B (en) Self-adaptive two-stage alarm method for monitoring axial displacement sudden change
CN115186007A (en) Airborne data identification real-time display method and system for monitoring and reminding
CN116224950A (en) Intelligent fault diagnosis method and system for self-organizing reconstruction of unmanned production line
CN114412447A (en) Fault detection method and device for screw pump well
CN112486096A (en) Machine tool operation state monitoring method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant