CN111626508B - Track traffic vehicle-mounted data prediction method based on xgboost model - Google Patents

Track traffic vehicle-mounted data prediction method based on xgboost model Download PDF

Info

Publication number
CN111626508B
CN111626508B CN202010460661.8A CN202010460661A CN111626508B CN 111626508 B CN111626508 B CN 111626508B CN 202010460661 A CN202010460661 A CN 202010460661A CN 111626508 B CN111626508 B CN 111626508B
Authority
CN
China
Prior art keywords
vehicle
mounted data
data
rail transit
xgboost model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010460661.8A
Other languages
Chinese (zh)
Other versions
CN111626508A (en
Inventor
王晓玲
李欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Priority to CN202010460661.8A priority Critical patent/CN111626508B/en
Publication of CN111626508A publication Critical patent/CN111626508A/en
Application granted granted Critical
Publication of CN111626508B publication Critical patent/CN111626508B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Tourism & Hospitality (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a rail transit vehicle-mounted data prediction method based on an xgboost model, which comprises the steps of firstly collecting rail transit vehicle-mounted data, extracting representative vehicle-mounted data features from all vehicle-mounted data features of the rail transit based on a CART decision tree, extracting data representative of the vehicle-mounted data features from original vehicle-mounted data, taking the data representative of the vehicle-mounted data features as vehicle-mounted data after feature extraction, constructing the xgboost model according to the vehicle-mounted data after feature extraction and corresponding labels thereof, and acquiring the vehicle-mounted data input xgboost model in the actual running process of the rail transit to obtain a prediction result of a parking distance. According to the invention, the characteristic representing the vehicle-mounted data is extracted based on the CART decision tree, and the xgboost model is constructed according to the vehicle-mounted data after the characteristic extraction, so that the accuracy of predicting the rail transit parking distance can be effectively improved.

Description

Track traffic vehicle-mounted data prediction method based on xgboost model
Technical Field
The invention belongs to the technical field of rail transit, and particularly relates to a rail transit vehicle-mounted data prediction method based on an xgboost model.
Background
The track traffic trip increasingly becomes an indispensable part of urban life, hundreds of sensors are distributed on trains and lines to monitor various data in the running process of the trains, and the data are huge in workload for judging the reasons of faults and errors of the trains by means of manual analysis. Meanwhile, analysis of the sensor data is also beneficial to timely adjusting the running parameters of the train, and better traveling experience is provided for passengers. Data analysis is also becoming increasingly important to various companies, with analysis of historical data and giving future predictions based thereon being the primary task of data analysis.
The vehicle-mounted data of the rail transit can represent the data format of most application fields, has huge data volume, multiple characteristics and rich data types, and is taken as an indispensable transportation tool for urban travel, and the analysis of the rail transit data is an indispensable part in rail transit operation. However, with the transition of the age and the change of technology, the traditional manual analysis means cannot meet the increasing data volume and the new analysis requirement. With the rapid development of artificial intelligence and machine learning, data-driven business is growing increasingly, and it has become common practice in the industry to perform data cleaning, feature selection and feature combination by using a machine learning algorithm and to construct a model to analyze mass data.
For feature extraction of rail transit data, there are general methods such as principal component analysis PCA (Principal Component Analysis) and correlation coefficient method. The principal component analysis PCA can compress massive features and retain important features, but the PCA is only suitable for the situation that the variables have strong correlation, and the feature extraction effect is not ideal for data with weak correlation. At the same time, a small amount of data may be lost during the feature extraction process, and the meaning of the data may change, so that the interpretation is weaker than that of the original data. The correlation coefficient method is sensitive to data organization, requires linear correlation of data, and if the data is in nonlinear correlation, such as square relation, the correlation coefficient can be small.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a rail transit vehicle-mounted data prediction method based on an xgboost model, which is used for extracting the characteristic of representing vehicle-mounted data based on a CART decision tree and constructing the xgboost model according to the vehicle-mounted data after the characteristic extraction so as to improve the accuracy of the rail transit parking distance prediction.
In order to achieve the aim of the invention, the rail transit vehicle-mounted data prediction method based on the xgboost model comprises the following steps:
s1: setting the vehicle-mounted data characteristics of M rail transit according to actual requirements, collecting the values of the M vehicle-mounted data characteristics during N times of parking in the actual running process of the rail transit, and recording the value of the M vehicle-mounted data characteristics during the nth time of parking as f nm N=1, 2, …, N, m=1, 2, …, M, the M vehicle-mounted data features obtained at each parking are constructed as one piece of vehicle-mounted data F n ={f n1 ,f n2 ,…,f nM Simultaneously recording the distance d between the train door and the shielding door when the stop is completed n Taking the data as a label corresponding to the vehicle-mounted data;
s2: constructing a CART decision tree according to the N pieces of rail transit vehicle-mounted data and the corresponding labels thereof obtained in the step S1, then extracting vehicle-mounted data features serving as dividing points each time in a hierarchical traversing manner from a root node to a leaf node of the generated CART decision tree, wherein the vehicle-mounted data features are representing vehicle-mounted data features, the number of representing vehicle-mounted data features is recorded as P, the P pieces of data representing the vehicle-mounted data features are extracted from the original N pieces of vehicle-mounted data, and the extracted N pieces of vehicle-mounted data are vehicle-mounted data after feature extraction;
s3: constructing an xgboost model according to the vehicle-mounted data after feature extraction and the corresponding label thereof;
s4: in the track traffic running process, P values representing the characteristics of vehicle-mounted data at the current moment are collected and input into an xgboost model to obtain a prediction result of the parking distance.
According to the rail transit vehicle-mounted data prediction method based on the xgboost model, firstly, rail transit vehicle-mounted data are collected, vehicle-mounted data representing features are extracted from all vehicle-mounted data features of the rail transit based on a CART decision tree, data representing the vehicle-mounted data features are extracted from original vehicle-mounted data and serve as vehicle-mounted data after feature extraction, the xgboost model is built according to the vehicle-mounted data after feature extraction and corresponding labels, and vehicle-mounted data in the actual running process of the rail transit are collected and input into the xgboost model to obtain a prediction result of the parking distance. According to the invention, the characteristic representing the vehicle-mounted data is extracted based on the CART decision tree, and the xgboost model is constructed according to the vehicle-mounted data after the characteristic extraction, so that the accuracy of predicting the rail transit parking distance can be effectively improved.
Drawings
Fig. 1 is a flowchart of a specific embodiment of the rail transit vehicle-mounted data prediction method based on the xgboost model.
Detailed Description
The following description of the embodiments of the invention is presented in conjunction with the accompanying drawings to provide a better understanding of the invention to those skilled in the art. It is to be expressly noted that in the description below, detailed descriptions of known functions and designs are omitted here as perhaps obscuring the present invention.
Examples
Fig. 1 is a flowchart of a specific embodiment of the rail transit vehicle-mounted data prediction method based on the xgboost model. As shown in fig. 1, the specific steps of the rail transit vehicle-mounted data prediction method based on the xgboost model comprise:
s101: collecting rail transit vehicle-mounted data:
setting the vehicle-mounted data characteristics of M rail transit according to actual requirements, collecting the values of the M vehicle-mounted data characteristics during N times of parking in the actual running process of the rail transit, and recording the value of the M vehicle-mounted data characteristics during the nth time of parking as f nm N=1, 2, …, N, m=1, 2, …, M, the M vehicle-mounted data features obtained at each parking are constructed as one piece of vehicle-mounted data F n ={f n1 ,f n2 ,…,f nM Simultaneously recording the distance y between the train door and the shielding door when the parking is completed n As a tag corresponding to the in-vehicle data.
Therefore, each piece of vehicle-mounted data of the rail transit comprises M-dimensional characteristics, each piece of vehicle-mounted data marks the state of a train at the corresponding moment, and the tag is the distance between the train door and the shielding door and is the result brought by the piece of vehicle-mounted data. Table 1 is an example of the in-vehicle data in the present embodiment.
TABLE 1
The time series indicates the time corresponding to the vehicle-mounted data collection and also indicates the time when the train is stopped.
S102: extracting representative vehicle-mounted data characteristics:
because the characteristics of the vehicle-mounted data are abnormally rich in the practical application, the value of M can be quite large, but the parking distance of the train is not influenced or is little influenced by the multi-dimensional data. Therefore, in order to simplify the calculation process and speed up the calculation, it is necessary to extract data affecting the parking distance from these mass data. The representative feature extraction is to extract features capable of representing the characteristics of data from mass features, and the correlation coefficient method and principal component analysis PCA have limitations, so the invention provides a CART (Classification And Regression Trees) decision tree-based representative vehicle-mounted data feature extraction method, which comprises the following specific steps:
constructing a CART decision tree according to the N pieces of rail transit vehicle-mounted data and the corresponding labels thereof obtained in the step S101, then, extracting vehicle-mounted data features serving as dividing points each time in a hierarchical traversing mode from a root node to a leaf node of the generated CART decision tree, wherein the vehicle-mounted data features are representing vehicle-mounted data features, the number of representing vehicle-mounted data features is recorded as P, the P pieces of data representing the vehicle-mounted data features are extracted from the original N pieces of vehicle-mounted data, and the extracted N pieces of vehicle-mounted data are the vehicle-mounted data with the extracted features.
The CART decision tree is constructed by recursively dividing each sub-region into two sub-regions in an input space where a training set is located and determining an output value on each sub-region, and the method can be briefly described as follows:
traversing each value of each vehicle-mounted data feature aiming at original vehicle-mounted data containing M vehicle-mounted data features, dividing the original N pieces of vehicle-mounted data into two sets by using the value, respectively calculating the mean square errors of the two sets, searching to obtain the value which enables the sum of the mean square errors of the two sets to be minimum, wherein the vehicle-mounted data feature corresponding to the value is the optimal dividing feature of the dividing point, and the value is the optimal dividing value. And then dividing the two sets obtained by division by searching the optimal division characteristics and the optimal division values until the termination condition is reached.
S103: constructing an xgboost model:
and constructing an xgboost model according to the vehicle-mounted data after the feature extraction and the corresponding label thereof.
The xgboost model is a relatively common learning model in recent years, integrates a plurality of models based on an integration idea, can well utilize the training result of the previous model, further trains the residual error of the model, and has excellent performance on most regression and classification problems. The xgboost model is an iterative model, and comprises a plurality of CART decision trees, and the generation of the latter decision tree is obtained by fitting the residual error of the former decision tree. For a specific principle and construction process of the xgboost model, reference may be made to paper "Tianqi Chen and Carlos guestin. Xgboost: A Scalable Tree Boosting system. In 22nd SIGKDD Conference on Knowledge Discovery and Data Mining,2016".
In order to make the performance of the constructed xgboost model better, the vehicle-mounted data with the characteristics reduced in dimension can be divided into a training set and a testing set, each decision tree in the xgboost model is firstly constructed by using the training set, then each decision tree is respectively tested by adopting the testing set, and the decision tree pruning operation can be further carried out on the decision tree with larger error.
S104: and (3) predicting the parking distance:
in the track traffic running process, P values representing the characteristics of vehicle-mounted data at the current moment are collected and input into an xgboost model to obtain a prediction result of the parking distance.
In order to better illustrate the technical effect of the invention, the invention is experimentally verified by adopting a specific example, and 476 test samples are adopted in total in the test. Table 2 is a table comparing the predicted value and the actual value of the partial stopping distance in the present embodiment.
TABLE 2
As shown in Table 2, the predicted value and the true value of the parking distance obtained by the method are very close, and the average error of the test sample is 0.0000087mm after statistics, so that the method can completely meet the requirements of practical application.
Table 3 is a comparison table of stopping distance predictions for xgboost models before and after feature data extraction in accordance with the present invention.
TABLE 3 Table 3
As shown in Table 3, the characteristic data representing the characteristics of the vehicle-mounted data is extracted from the historical vehicle-mounted data of the rail transit, so that the performance of the xgboost model obtained by construction is better, and compared with the traditional xgboost model which is not extracted by the characteristic data and is constructed by directly adopting the original vehicle-mounted data, the xgboost model obtained by the invention has better performance in three performance evaluation indexes of MSE (Mean Squared Error, mean Square error), R-Square (determination coefficient) and MAE (Mean Absolute Error, average absolute error).
While the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by the accompanying claims insofar as various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims (1)

1. The rail transit vehicle-mounted data prediction method based on the xgboost model is characterized by comprising the following steps of:
s1: setting the vehicle-mounted data characteristics of M rail transit according to actual requirements, collecting the values of the M vehicle-mounted data characteristics during N times of parking in the actual running process of the rail transit, and recording the value of the M vehicle-mounted data characteristics during the nth time of parking as f nm N=1, 2, …, N, m=1, 2, …, M, the M vehicle-mounted data features obtained at each parking are constructed as one piece of vehicle-mounted data F n ={f n1 ,f n2 ,…,f nM Simultaneously recording the distance d between the train door and the shielding door when the stop is completed n Taking the data as a label corresponding to the vehicle-mounted data;
s2: constructing a CART decision tree according to the N pieces of rail transit vehicle-mounted data and the corresponding labels thereof obtained in the step S1, then extracting vehicle-mounted data features serving as dividing points each time in a hierarchical traversing manner from a root node to a leaf node of the generated CART decision tree, wherein the vehicle-mounted data features are representing vehicle-mounted data features, the number of representing vehicle-mounted data features is recorded as P, the M pieces of data representing the vehicle-mounted data features are extracted from the original N pieces of vehicle-mounted data, and the extracted N pieces of vehicle-mounted data are vehicle-mounted data after feature extraction;
s3: constructing an xgboost model according to the vehicle-mounted data after feature extraction and the corresponding label thereof;
s4: in the track traffic running process, P values representing the characteristics of vehicle-mounted data at the current moment are collected and input into an xgboost model to obtain a prediction result of the parking distance.
CN202010460661.8A 2020-05-27 2020-05-27 Track traffic vehicle-mounted data prediction method based on xgboost model Active CN111626508B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010460661.8A CN111626508B (en) 2020-05-27 2020-05-27 Track traffic vehicle-mounted data prediction method based on xgboost model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010460661.8A CN111626508B (en) 2020-05-27 2020-05-27 Track traffic vehicle-mounted data prediction method based on xgboost model

Publications (2)

Publication Number Publication Date
CN111626508A CN111626508A (en) 2020-09-04
CN111626508B true CN111626508B (en) 2023-12-22

Family

ID=72271918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010460661.8A Active CN111626508B (en) 2020-05-27 2020-05-27 Track traffic vehicle-mounted data prediction method based on xgboost model

Country Status (1)

Country Link
CN (1) CN111626508B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113570862B (en) * 2021-07-28 2022-05-10 太原理工大学 XGboost algorithm-based large traffic jam early warning method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108551167A (en) * 2018-04-25 2018-09-18 浙江大学 A kind of electric power system transient stability method of discrimination based on XGBoost algorithms
CN110543988A (en) * 2019-08-28 2019-12-06 上海电力大学 Photovoltaic short-term output prediction system and method based on XGboost algorithm
CN110610016A (en) * 2019-07-15 2019-12-24 广东毓秀科技有限公司 Method for predicting rail transit stopping problem based on big data machine learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108551167A (en) * 2018-04-25 2018-09-18 浙江大学 A kind of electric power system transient stability method of discrimination based on XGBoost algorithms
CN110610016A (en) * 2019-07-15 2019-12-24 广东毓秀科技有限公司 Method for predicting rail transit stopping problem based on big data machine learning
CN110543988A (en) * 2019-08-28 2019-12-06 上海电力大学 Photovoltaic short-term output prediction system and method based on XGboost algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于XGBoost的城市轨道交通短时客流预测;张杉基;;青海交通科技(01);全文 *
基于XGBoost算法的入侵检测分析与应用;胡臻伟;硕士电子期刊;第2020年卷(第01期);全文 *

Also Published As

Publication number Publication date
CN111626508A (en) 2020-09-04

Similar Documents

Publication Publication Date Title
CN107563426B (en) Method for learning locomotive running time sequence characteristics
CN111381170A (en) Electric vehicle battery pack health state prediction method and system based on big data
CN108459955B (en) Software defect prediction method based on deep self-coding network
CN112949715A (en) SVM (support vector machine) -based rail transit fault diagnosis method
CN110084534B (en) Driving risk factor quantification method based on driving behavior portrait
CN105279964A (en) Road network traffic data completion method based on low-order algorithm
CN113688558B (en) Automobile driving condition construction method and system based on large database sample
CN106528417A (en) Intelligent detection method and system of software defects
CN113506269B (en) Turnout and non-turnout rail fastener positioning method based on deep learning
CN111626508B (en) Track traffic vehicle-mounted data prediction method based on xgboost model
CN116631186A (en) Expressway traffic accident risk assessment method and system based on dangerous driving event data
CN117131449A (en) Data management-oriented anomaly identification method and system with propagation learning capability
CN114818353A (en) Train control vehicle-mounted equipment fault prediction method based on fault characteristic relation map
CN110598747A (en) Road classification method based on self-adaptive K-means clustering algorithm
CN117828539A (en) Intelligent data fusion analysis system and method
CN117852541A (en) Entity relation triplet extraction method, system and computer equipment
CN114416686B (en) Vehicle equipment fingerprint CARID identification system and identification method
CN113361624A (en) Machine learning-based sensing data quality evaluation method
Xu et al. Rail defect detection method based on BP neural network
CN115808504B (en) Online drift compensation method for gas sensor for concentration prediction
CN114638558B (en) Data set classification method for operation accident analysis of comprehensive energy system
CN113420387B (en) Migration diagnosis method and system for rolling bearing of compacting machine
CN115460097B (en) Fusion model-based mobile application sustainable trust evaluation method and device
Gao et al. Analysis of the Abnormality of Traction Energy Consumption in Urban Rail Transit System
CN118075090A (en) Network fault prediction method based on machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant