CN111626508A - Rail transit vehicle-mounted data prediction method based on xgboost model - Google Patents

Rail transit vehicle-mounted data prediction method based on xgboost model Download PDF

Info

Publication number
CN111626508A
CN111626508A CN202010460661.8A CN202010460661A CN111626508A CN 111626508 A CN111626508 A CN 111626508A CN 202010460661 A CN202010460661 A CN 202010460661A CN 111626508 A CN111626508 A CN 111626508A
Authority
CN
China
Prior art keywords
vehicle
mounted data
rail transit
data
xgboost model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010460661.8A
Other languages
Chinese (zh)
Other versions
CN111626508B (en
Inventor
王晓玲
李欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Priority to CN202010460661.8A priority Critical patent/CN111626508B/en
Publication of CN111626508A publication Critical patent/CN111626508A/en
Application granted granted Critical
Publication of CN111626508B publication Critical patent/CN111626508B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Tourism & Hospitality (AREA)
  • General Engineering & Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computing Systems (AREA)
  • Operations Research (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a rail transit vehicle-mounted data prediction method based on an xgboost model, which comprises the steps of firstly collecting rail transit vehicle-mounted data, extracting vehicle-mounted data characteristics from all vehicle-mounted data characteristics of rail transit based on a CART decision tree, extracting data representing the vehicle-mounted data characteristics from original vehicle-mounted data to be used as vehicle-mounted data after characteristic extraction, constructing the xgboost model according to the vehicle-mounted data after characteristic extraction and a corresponding label thereof, collecting the vehicle-mounted data in the actual running process of rail transit, inputting the vehicle-mounted data into the xgboost model, and obtaining a prediction result of parking distance. The method extracts the representative vehicle-mounted data features based on the CART decision tree, constructs the xgboost model according to the vehicle-mounted data after feature extraction, and can effectively improve the accuracy of the rail transit stopping distance prediction.

Description

Rail transit vehicle-mounted data prediction method based on xgboost model
Technical Field
The invention belongs to the technical field of rail transit, and particularly relates to a rail transit vehicle-mounted data prediction method based on an xgboost model.
Background
The rail transit trip becomes an indispensable part of urban life increasingly, and the train and the circuit are distributed hundreds of sensors and are used for monitoring various data in the train operation, and the data are large in workload for judging the reason that the train breaks down and has errors by purely depending on manual analysis. Meanwhile, the analysis of the sensor data is also helpful for adjusting the running parameters of the train in time, so that better trip experience is provided for passengers. Data analysis is also increasingly valued by various companies, and analyzing historical data and giving future predictions based on the historical data is the most important task of data analysis.
The rail transit vehicle-mounted data can represent data formats of most application fields, the data volume is huge, the characteristics are multiple, the data types are rich, and as an indispensable vehicle for urban trip, rail transit data analysis is an indispensable part in rail transit operation. However, with the change of times and the revolution of technologies, the traditional manual analysis means cannot meet the increasing data volume and the new analysis requirements. With the rapid development of artificial intelligence and machine learning, data-driven services are increasing day by day, and it has become a common practice in the industry to perform data cleaning, feature selection and feature combination by using a machine learning algorithm and construct a model to analyze mass data.
For the feature extraction of the rail transit data, the common methods generally include principal component analysis (pca) (principal component analysis), a correlation coefficient method, and the like. The PCA can compress mass features and retain more important features, but the PCA is only suitable for the situation that strong correlation exists among variables, and the feature extraction effect is not ideal for data with weak correlation. Meanwhile, a small amount of data may be lost in the feature extraction process, and the meaning of the data may change, which is less interpretative than the original data. The correlation coefficient method is sensitive to data organization, and requires linear correlation of data, and if the data is non-linearly correlated, such as a square relationship, the correlation coefficient may be very small.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a rail transit vehicle-mounted data prediction method based on an xgboost model.
In order to achieve the purpose, the rail transit vehicle-mounted data prediction method based on the xgboost model comprises the following steps:
s1: setting M vehicle-mounted data characteristics of rail transit according to actual needs, collecting values of the M vehicle-mounted data characteristics during N times of parking in the actual running process of the rail transit, and recording the value of the mth vehicle-mounted data characteristic during the nth time of parking as fnmN is 1,2, …, N, M is 1,2, …, M, and M vehicle-mounted data features obtained at each parking are constructed as a piece of vehicle-mounted data Fn={fn1,fn2,…,fnMAnd simultaneously recording the distance d between the train door and the shield door when the stop is finishednThe data is used as a label corresponding to the vehicle-mounted data;
s2: constructing a CART decision tree according to the N pieces of rail transit vehicle-mounted data obtained in the step S1 and the corresponding labels thereof, then extracting vehicle-mounted data characteristics which are used as dividing points each time from a root node to a leaf node of the generated CART decision tree in a hierarchical traversal mode, wherein the vehicle-mounted data characteristics are vehicle-mounted data characteristics, the number of the vehicle-mounted data characteristics is recorded as P, the P pieces of data representing the vehicle-mounted data characteristics are extracted from the original N pieces of vehicle-mounted data, and the N pieces of vehicle-mounted data obtained by extraction are vehicle-mounted data after characteristic extraction;
s3: constructing an xgboost model according to the vehicle-mounted data after the characteristic extraction and the corresponding label thereof;
s4: in the rail transit operation process, P values representing vehicle-mounted data characteristics at the current moment are collected and input into an xgboost model to obtain a prediction result of the parking distance.
The invention relates to a rail transit vehicle-mounted data prediction method based on an xgboost model, which comprises the steps of firstly collecting rail transit vehicle-mounted data, extracting vehicle-mounted data characteristics from all vehicle-mounted data characteristics of rail transit based on a CART decision tree, extracting data representing the vehicle-mounted data characteristics from original vehicle-mounted data to serve as vehicle-mounted data after characteristic extraction, constructing the xgboost model according to the vehicle-mounted data after characteristic extraction and a corresponding label thereof, collecting the vehicle-mounted data in the actual running process of rail transit, inputting the vehicle-mounted data into the xgboost model, and obtaining the prediction result of the parking distance. The method extracts the representative vehicle-mounted data features based on the CART decision tree, constructs the xgboost model according to the vehicle-mounted data after feature extraction, and can effectively improve the accuracy of the rail transit stopping distance prediction.
Drawings
Fig. 1 is a flowchart of an embodiment of a rail transit vehicle-mounted data prediction method based on an xgboost model.
Detailed Description
The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.
Examples
Fig. 1 is a flowchart of an embodiment of a rail transit vehicle-mounted data prediction method based on an xgboost model. As shown in fig. 1, the method for predicting the vehicle-mounted data of the rail transit based on the xgboost model comprises the following specific steps:
s101: collecting rail transit vehicle-mounted data:
setting M vehicle-mounted data characteristics of rail transit according to actual needs, collecting values of the M vehicle-mounted data characteristics during N times of parking in the actual running process of the rail transit, and recording the value of the mth vehicle-mounted data characteristic during the nth time of parking as fnmN is 1,2, …, N, M is 1,2, …, M, and M vehicle-mounted data features obtained at each parking are constructed as a piece of vehicle-mounted data Fn={fn1,fn2,…,fnMAnd simultaneously recording the distance y between the train door and the shielding door when the parking is finishednThis is used as a tag corresponding to the vehicle-mounted data.
Therefore, each piece of vehicle-mounted data of the rail transit comprises the M-dimensional characteristics, each piece of vehicle-mounted data indicates the state of the train at the corresponding moment, and the label is the distance between the train door and the shield door and is the result brought by the piece of vehicle-mounted data. Table 1 is an example of the vehicle-mounted data in the present embodiment.
Figure BDA0002510846350000031
TABLE 1
The time column represents the time corresponding to the vehicle-mounted data collection and also represents the time when the train stops.
S102: extracting representative vehicle-mounted data characteristics:
as the vehicle-mounted data features are extremely rich in practical application, the value of M may be very large, but the parking distance of the train is not influenced or is slightly influenced by a lot of multidimensional data. Therefore, in order to simplify the calculation process and increase the calculation speed, data having an influence on the parking distance needs to be extracted from the massive data. The representative feature extraction is to extract features capable of representing data features from mass features, And because a correlation coefficient method And Principal Component Analysis (PCA) have limitations, the invention provides a representative vehicle-mounted data feature extraction method based on a CART (classification And Regression Trees) decision tree, which comprises the following specific steps:
and (2) constructing a CART decision tree according to the N pieces of rail transit vehicle-mounted data obtained in the step (S101) and the corresponding labels thereof, then extracting vehicle-mounted data characteristics which are used as dividing points each time from a root node to leaf nodes of the generated CART decision tree in a hierarchical traversal mode, wherein the vehicle-mounted data characteristics are vehicle-mounted data characteristics, the number of the vehicle-mounted data characteristics is recorded as P, the P pieces of data representing the vehicle-mounted data characteristics are extracted from the original N pieces of vehicle-mounted data, and the extracted N pieces of vehicle-mounted data are the vehicle-mounted data after the characteristics are extracted.
The CART decision tree is a binary tree constructed by recursively dividing each sub-region into two sub-regions and determining an output value on each sub-region in an input space where a training set is located, and the method can be briefly described as follows:
and traversing each value of each vehicle-mounted data characteristic aiming at the original vehicle-mounted data containing M vehicle-mounted data characteristics, dividing the original N pieces of vehicle-mounted data into two sets by using the values, respectively calculating the mean square errors of the two sets, searching to obtain the value which minimizes the sum of the mean square errors of the two sets, wherein the vehicle-mounted data characteristic corresponding to the value is the optimal division characteristic of the division point, and the value is the optimal division value. And then dividing the two sets obtained by division by searching for the optimal division characteristics and the optimal division values until a termination condition is reached.
S103: constructing an xgboost model:
and constructing an xgboost model according to the vehicle-mounted data after the feature extraction and the corresponding label thereof.
The xgboost model is a relatively common learning model in recent years, integrates a plurality of models based on an integration idea, can well utilize a training result of the previous model to further train residual errors of the models, and is excellent in most regression and classification problems. The xgboost model is an iterative model, which includes multiple CART decision trees, and the generation of the latter decision tree is obtained by fitting the residuals of the former decision tree. The specific principle and construction process of the xgboost model can be referred to the paper "Tianqi Chen and cars Guestin. XGboost: A scalable Tree Boosting System. In 22nd SIGKDD Conference on Knowledge Discovery and data Mining, 2016".
In order to make the performance of the constructed xgboost model better, the vehicle-mounted data after feature dimensionality reduction can be divided into a training set and a test set, firstly, the training set is used for constructing each decision tree in the xgboost model, then, the test set is adopted for testing each decision tree, and for the decision tree with larger error, the decision tree pruning operation can be further carried out.
S104: and (3) predicting the parking distance:
in the rail transit operation process, P values representing vehicle-mounted data characteristics at the current moment are collected and input into an xgboost model to obtain a prediction result of the parking distance.
In order to better illustrate the technical effect of the invention, the invention is experimentally verified by using a specific example, and 476 test samples are used in total. Table 2 is a comparison table of the predicted value and the actual value of the partial parking distance in the present embodiment.
Figure BDA0002510846350000051
Figure BDA0002510846350000061
TABLE 2
As shown in Table 2, the predicted value and the true value of the parking distance obtained by the method are very close, and the average error of the test sample is 0.0000087mm through statistics, so that the requirement of practical application can be completely met.
Table 3 is a parking distance prediction comparison table of the xgboost model before and after feature data extraction of the present invention.
Figure BDA0002510846350000062
TABLE 3
As shown in Table 3, the characteristic data which can better reflect the data characteristics is extracted from the rail transit historical vehicle-mounted data by extracting the characteristic which represents the vehicle-mounted data, so that the performance of the constructed xgboost model is better, and compared with the conventional xgboost model which is constructed by directly adopting the original vehicle-mounted data without extracting the characteristic data, the xgboost model obtained by the method has more excellent performances in three performance evaluation indexes, namely Mean Squared Error, R-Square (determination coefficient) and MAE (Mean absolute Error).
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims (1)

1. A rail transit vehicle-mounted data prediction method based on an xgboost model is characterized by comprising the following steps:
s1: setting M vehicle-mounted data characteristics of rail transit according to actual needs, collecting values of the M vehicle-mounted data characteristics during N times of parking in the actual running process of the rail transit, and recording the value of the mth vehicle-mounted data characteristic during the nth time of parking as fnmN is 1,2, …, N, M is 1,2, …, M, and M vehicle-mounted data features obtained at each parking are constructed as a piece of vehicle-mounted data Fn={fn1,fn2,…,fnMAnd simultaneously recording the distance d between the train door and the shield door when the stop is finishednThe data is used as a label corresponding to the vehicle-mounted data;
s2: constructing a CART decision tree according to the N pieces of rail transit vehicle-mounted data obtained in the step S1 and the corresponding labels thereof, then extracting vehicle-mounted data characteristics which are used as dividing points each time from a root node to a leaf node of the generated CART decision tree in a hierarchical traversal mode, wherein the vehicle-mounted data characteristics are vehicle-mounted data characteristics, the number of the vehicle-mounted data characteristics is recorded as P, the M pieces of data representing the vehicle-mounted data characteristics are extracted from the original N pieces of vehicle-mounted data, and the N pieces of extracted vehicle-mounted data are vehicle-mounted data after the characteristics are extracted;
s3: constructing an xgboost model according to the vehicle-mounted data after the characteristic extraction and the corresponding label thereof;
s4: in the rail transit operation process, P values representing vehicle-mounted data characteristics at the current moment are collected and input into an xgboost model to obtain a prediction result of the parking distance.
CN202010460661.8A 2020-05-27 2020-05-27 Track traffic vehicle-mounted data prediction method based on xgboost model Active CN111626508B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010460661.8A CN111626508B (en) 2020-05-27 2020-05-27 Track traffic vehicle-mounted data prediction method based on xgboost model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010460661.8A CN111626508B (en) 2020-05-27 2020-05-27 Track traffic vehicle-mounted data prediction method based on xgboost model

Publications (2)

Publication Number Publication Date
CN111626508A true CN111626508A (en) 2020-09-04
CN111626508B CN111626508B (en) 2023-12-22

Family

ID=72271918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010460661.8A Active CN111626508B (en) 2020-05-27 2020-05-27 Track traffic vehicle-mounted data prediction method based on xgboost model

Country Status (1)

Country Link
CN (1) CN111626508B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113570862A (en) * 2021-07-28 2021-10-29 太原理工大学 XGboost algorithm-based large traffic jam early warning method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108551167A (en) * 2018-04-25 2018-09-18 浙江大学 A kind of electric power system transient stability method of discrimination based on XGBoost algorithms
CN110543988A (en) * 2019-08-28 2019-12-06 上海电力大学 Photovoltaic short-term output prediction system and method based on XGboost algorithm
CN110610016A (en) * 2019-07-15 2019-12-24 广东毓秀科技有限公司 Method for predicting rail transit stopping problem based on big data machine learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108551167A (en) * 2018-04-25 2018-09-18 浙江大学 A kind of electric power system transient stability method of discrimination based on XGBoost algorithms
CN110610016A (en) * 2019-07-15 2019-12-24 广东毓秀科技有限公司 Method for predicting rail transit stopping problem based on big data machine learning
CN110543988A (en) * 2019-08-28 2019-12-06 上海电力大学 Photovoltaic short-term output prediction system and method based on XGboost algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张杉基;: "基于XGBoost的城市轨道交通短时客流预测", 青海交通科技, no. 01 *
胡臻伟: "基于XGBoost算法的入侵检测分析与应用", 硕士电子期刊, vol. 2020, no. 01 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113570862A (en) * 2021-07-28 2021-10-29 太原理工大学 XGboost algorithm-based large traffic jam early warning method
CN113570862B (en) * 2021-07-28 2022-05-10 太原理工大学 XGboost algorithm-based large traffic jam early warning method

Also Published As

Publication number Publication date
CN111626508B (en) 2023-12-22

Similar Documents

Publication Publication Date Title
CN110597735B (en) Software defect prediction method for open-source software defect feature deep learning
CN107563426B (en) Method for learning locomotive running time sequence characteristics
CN111381170A (en) Electric vehicle battery pack health state prediction method and system based on big data
CN108459955B (en) Software defect prediction method based on deep self-coding network
CN110166484A (en) A kind of industrial control system intrusion detection method based on LSTM-Attention network
CN108665093B (en) Deep learning-based expressway traffic accident severity prediction method
CN111274817A (en) Intelligent software cost measurement method based on natural language processing technology
CN113688558A (en) Automobile driving condition construction method and system based on large database samples
CN114139624A (en) Method for mining time series data similarity information based on integrated model
CN112529678A (en) Financial index time sequence abnormity detection method based on self-supervision discriminant network
CN116631186A (en) Expressway traffic accident risk assessment method and system based on dangerous driving event data
CN116257759A (en) Structured data intelligent classification grading system of deep neural network model
CN116150191A (en) Data operation acceleration method and system for cloud data architecture
CN111626508A (en) Rail transit vehicle-mounted data prediction method based on xgboost model
CN117391084A (en) Data management method and system based on DCMM system and deep learning
CN116756825A (en) Group structural performance prediction system for middle-small span bridge
CN115184054B (en) Mechanical equipment semi-supervised fault detection and analysis method, device, terminal and medium
CN115221045A (en) Multi-target software defect prediction method based on multi-task and multi-view learning
CN115081741A (en) Natural gas metrological verification intelligent prediction method based on neural network
CN114077663A (en) Application log analysis method and device
CN114638558B (en) Data set classification method for operation accident analysis of comprehensive energy system
CN117573655B (en) Data management optimization method and system based on convolutional neural network
CN117828539B (en) Intelligent data fusion analysis system and method
CN113778733B (en) Log sequence anomaly detection method based on multi-scale MASS
CN118075090A (en) Network fault prediction method based on machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant