CN117521882B

CN117521882B - Method for predicting urban rail transit accident result based on integrated learning model

Info

Publication number: CN117521882B
Application number: CN202311449164.8A
Authority: CN
Inventors: 刘杰; 李欣垚; 何明卫; 刘尉艺; 李文新; 税文兵; 谢俊平
Original assignee: Kunming University of Science and Technology; Hubei University of Arts and Science
Current assignee: Kunming University of Science and Technology; Hubei University of Arts and Science
Priority date: 2023-11-02
Filing date: 2023-11-02
Publication date: 2024-05-24
Anticipated expiration: 2043-11-02
Also published as: CN117521882A

Abstract

The invention relates to the technical field of urban rail transit, in particular to a method for predicting urban rail transit accident consequences based on an integrated learning model, which establishes an integrated learning model (EMBC) based on a convolutional neural network CNN and a BERT model; EMBC, CNN is used to extract the effective information of numerical data (such as the train running speed, the line where the train is located, etc.), BERT is used to learn the complex relationship in the accident text description, and the self-learning parameter-based Bagging method is used to aggregate the classification results of BERT and CNN, so as to obtain the final urban rail transit accident result prediction result. The method has higher prediction accuracy and wider application scene.

Description

Method for predicting urban rail transit accident result based on integrated learning model

Technical Field

The invention relates to the technical field of urban rail transit, in particular to a method for predicting urban rail transit accident results based on an integrated learning model.

Background

Urban rail transit plays a vital role in relieving urban traffic jams and promoting green travel. Urban rail transit is a main travel mode of urban residents rapidly due to the characteristics of safety, high efficiency, environmental protection and the like. However, in daily operation, unavoidable operation accidents cause train delay or cancellation, and even can cause the interruption of the whole urban rail transit operation, which has serious negative effects on the traveling of passengers and urban operation. Against this challenge, historical accident data is deeply analyzed and mined, so that not only can the mechanism of accident generation be accurately understood, but also the consequences of accident generation can be accurately predicted. However, there is a large amount of heterogeneous data in the historical accident data, such as: time, text, numerical values, etc., increases the complexity of data mining and accident outcome prediction. Therefore, there is an urgent need to develop an advanced urban rail transit accident outcome prediction method, which fully exploits and utilizes heterogeneous information in urban rail transit historical accident data to accurately predict accident outcomes.

Disclosure of Invention

The invention provides a method for predicting the urban rail transit accident result based on an ensemble learning model, which has higher prediction accuracy and effectiveness.

According to the method for predicting the urban rail transit accident result based on the integrated learning model, an integrated learning model (EMBC) based on a convolutional neural network CNN and a BERT model is established; EMBC, CNN is used to capture spatial patterns in accident data, BERT is used to learn complex relationships in accident text descriptions; and aggregating the classification results of BERT and CNN by using a Bagging method to finally obtain the prediction result of the urban rail transit accident result.

Preferably, in CNN, a fully connected layer with 7 units is used, then a softmax function is used to generate a predicted probability of an accident outcome; the CNN is trained using a binary cross entropy loss function and optimized using Adam optimization algorithm.

Preferably, in BERT, in each transducer layer, a self-attention mechanism is used to learn the relationship between the different labels; and uses a multi-headed attention mechanism to enable the model to focus on information in different marker subspaces at different locations simultaneously:

MultiHead(Q，K，V)＝Concat(head₁,...,head_n)W^O

head_i＝Attention(QW_i ^Q,KW_i ^K,VW_i ^V)

Wherein Q, K, V represent query embedding, key value embedding and real value embedding respectively; w _i ^Q,W_i ^K,W_i ^V,W_i ^O is a matrix of learnable parameters;

the output of BERT comes from the top transducer encoder, [ CLS ] tag as a special symbol placed at the beginning of the text, the aggregate sequence used as a classification task to represent its final expression, and then this [ CLS ] tag is passed to the classification layer; BERT uses a softmax layer to divide the outcome types of urban rail transit incidents.

Preferably, EMBC is input as a triplet D (S, T, L) during training, where S is a vector set of n x D, n representing the total number of records of accident statistics and D representing the number of accident features; t represents an accident description string set; A set of tags representing the consequences of an accident;

to better represent different categories of statistics, the present embodiment uses a single thermal encoding for each S _i and generates a vector, denoted S _Oi; carrying out dimension reduction processing on the high-dimension data by using a principal component analysis method, marking the dimension after dimension reduction as beta, and taking the dimension as a radix superparameter in the model; the vector with reduced dimensions is denoted as S _ri; thus, the data processing and dimension reduction process is expressed as:

S_ri＝PCA_β(ONEH OT(S_i))

wherein ONEHOT functions are single-heat coding functions, PCA is a main component analysis dimension reduction function; s _i∈Rⁿ,S_ri∈{0,1}^β.

Preferably, the ensemble learning Bagging method selects a weighted aggregation method, and thus, the output of EMBC is defined as:

Y＝softmax(η₁*F_CNN(S_r)+η₂*F_BERT(T))

Where F _CNN represents the output of the post-training CNN, F _BERT represents the output of the post-training BERT, η ₁ and η ₂ are weights related to the model output; y is an n-dimensional vector; the Softmax function converts the aggregate value for each class to between 0-1.

In order to effectively solve the problem of heterogeneity of urban rail transit historical accident data, the invention provides a novel method EMBC for improving the accuracy of predicting accident consequences by using the historical operation accident data; EMBC aims to take full advantage of the digital data and descriptive text to provide comprehensive analytical capabilities. In addition, the invention enhances the integration of different models promoted by the traditional bagging method so as to meet the specific requirements of heterogeneous data sets. Experiments show that the method EMBC provided by the invention has higher prediction accuracy and higher effectiveness than other models (such as a multi-layer perceptron, a support vector machine and a Bayesian model).

Drawings

FIG. 1 is a flow chart of EMBC for predicting the consequences of an urban rail transit accident in an embodiment;

FIG. 2 is a frame diagram of BERT in an embodiment;

FIG. 3 is a statistical diagram of the number of different types of accidents and the number of accidents in different time periods according to the embodiment;

FIG. 4 is a schematic diagram of the number of incidents at different times in an embodiment;

FIG. 5 is a schematic representation of the ROC curve of a model test in an embodiment;

FIG. 6 is a diagram illustrating a proportional-plus-proportional-parameter adjustment accuracy curve of the base model of EMBC in an example;

FIG. 7 is a graph showing the comparison of the predicted performance of different models in the examples.

Detailed Description

For a further understanding of the present invention, the present invention will be described in detail with reference to the drawings and examples. It is to be understood that the examples are illustrative of the present invention and are not limiting.

Examples

As shown in fig. 1, the present embodiment provides a method for predicting urban rail transit accident consequences based on an ensemble learning model, and establishes an ensemble learning model (EMBC) based on a convolutional neural network CNN and a BERT model; EMBC, CNN is used to capture spatial patterns in incident data, through which key features are extracted from incident data, and to determine spatial patterns that are relevant to the consequences of a particular incident. The BERT is used for learning complex relations in accident text description, and important semantic information can be extracted from the text by using the BERT training accident description. And aggregating the classification results of BERT and CNN by using a Bagging method to obtain a final urban rail transit accident result prediction result.

Historical urban rail transit accident logs typically contain heterogeneous data (e.g., speed, route, weather conditions, accident descriptions). The summary of the urban rail transit accident log used in this example is shown in table 1. However, the conventional machine learning model can only process one type of data, for example, CNN can only process numerical data such as vehicle speed, line type, temperature, time, etc., and BERT can only process text data such as accident descriptions. Thus, the large amount of valuable information present in heterogeneous data cannot be fully used for prediction. Therefore, an integrated learning model (EMBC) based on the convolutional neural network CNN and the BERT model is established, and the accident result prediction can be performed by fully utilizing the homogeneous data.

Table 1 heterogeneous urban rail transit Accident Log Abstract

The urban rail transit accident data set is firstly subjected to data processing and then enters EMBC for training. The data processing includes data cleaning, feature encoding, tag encoding, etc.

CNN

The base learner used to extract numerical data of this embodiment is CNN. CNNs can use convolution filters to efficiently extract features from input data and learn to detect important patterns and relationships in the input data. Accordingly, CNNs have been applied to traffic fields such as traffic flow prediction and traffic accident prediction, and exhibit good performance.

In CNN, a convolution layer applies a set of convolution filters to input data to generate a set of output feature maps. Each filter is a small weight matrix learned during training that is applied to the input data in a sliding window fashion, producing a new value at each location. When the filter size is p×q, the forward propagation can be expressed as follows:

Wherein the method comprises the steps of The output characteristic diagram is defined as the first layer size h x w. w ^l is the weight matrix of the filters on the l layers. b ^l is the bias of the l layers, f () represents the activation function.

The pooling layer is used for reducing the output feature mapping, so that the dimension of data is reduced, and the calculation efficiency of the model is improved. The fully connected layer is used to generate the final output of the model, such as classification or regression prediction.

The chain law derives the gradient of individual weights by the following equation:

The summing portion may be expressed as:

The CNN architecture used in this embodiment follows ResNet a, which starts with a 7x7 convolutional layer, followed by maximum pooling. It consists of four residual blocks, each with two sets of 3x3 convolutions, batch normalization and ReLU activation. The number of filters for these blocks is 64, 128, 256 and 512, respectively, the first layer of the last three blocks being downsampled by 2 steps. To solve the problem of this embodiment, this embodiment uses a full connection layer with 7 cells, and then uses the softmax function to generate the probability of accident outcome prediction. CNNs are trained by using binary cross entropy loss functions and optimized using Adam optimization algorithms.

BERT

BERT is a pre-trained deep learning model for natural language processing tasks developed by google artificial intelligence institute in 2018. In view of the excellent performance of BERT in various NLP tasks, the present embodiment takes BERT as one of the base learners, extracting semantic information from the event description.

BERT may be considered to be made up of a series of fransformer layer stacks. Unlike traditional deep learning models based on recurrent neural networks or CNNs, the Transformer model is based entirely on the mechanism of attention. It consists of a multi-headed self-focusing sublayer and a fully connected sublayer. Each multi-headed attention block attempts to capture the dependency between each pair of positions simultaneously by linear transformation. Since the self-attention mechanism is based on linear projection, fully connected sub-layers are introduced to learn the non-linear dependencies and dependencies.

In each transducer layer, a self-attention mechanism is used to learn the relationship between the different labels; by mapping the query and a set of key-value pairs to the output, the attention mechanism enables the model to focus on specific features of the input data, enabling a more detailed understanding of complex patterns. The two most common types of attention functions are additive attention and dot product type attention. In practice, dot product attention is fast and space-saving, so here a common scaled dot product attention function is applied:

where Q is the query matrix, K is the key matrix, and V is the value matrix. When the dot product becomes large, the softmax function has a very small gradient, so the dot product consists of Scaling to counteract this effect.

Furthermore, the present embodiment uses multi-headed attention to allow the model to process information from different marker subspaces at different locations together:

MultiHead(Q，K，V)＝Concat(head₁,...,head_n)W^O

head_i＝Attention(QW_i ^Q,KW_i ^K,VW_i ^V)

Wherein W _i ^Q,W_i ^K,W_i ^V,W_i ^O is a matrix of learnable parameters; just as the origin BERT, this embodiment uses h=8, d _k＝d_v =d/h=64.

The output of BERT comes from the top transducer encoder, this embodiment focuses on the final representation of the [ CLS ] tag, which is a special symbol placed at the beginning of the text, the aggregate sequence used as a classification task to represent its final expression, and then passes this [ CLS ] tag to the classification layer; BERT uses a softmax layer to classify categories of urban rail transit accident consequences. The BERT architecture of this embodiment is shown in fig. 2.

BERT and CNN based integrated model (EMBC)

During the training phase, the input EMBC may be defined as a triplet D (S, T, L), where S is a vector set of n x D, n representing the total number of records of the incident statistics and D representing the number of incident features. S _i denotes the i-th record vector in S. T denotes an accident description string set number, and T _i is a description of the i-th record.Representing the resulting set of labels. Meanwhile, l _i is a label of the i-th record vector. Specifically, the present embodiment trains the CNN of the present embodiment using tuple D (S, L), and trains the BERT using tuple D (T, L).

To better represent the different categories of statistics, the present embodiment uses a single thermal encoding at each S _i and generates a vector, denoted S _Oi. The principal component analysis method is used to reduce the dimension in the face of the excessively high dimension. The reduced dimension is noted as β and serves as the radix superparameter for the model. The vector after dimension reduction is denoted as S _ri. Thus, this process can be expressed as:

S_ri＝PCA_β(ONEH OT(S_i))

In this example, CNN was trained using Adam optimization with a learning rate set to 0.0001. This embodiment uses dropout to prevent overfitting, with the loss rate set to 0.0005. In the prediction phase, CNN and BERT may be regarded as base models in the ensemble learning model. Since the two models are heterogeneous, the algorithm can be regarded as an integrated learning of the bagging algorithm. In conventional Bagging, each base learner makes an equal contribution to the final decision through a voting mechanism. In this embodiment, a weighted summary method is selected instead of decision voting. Thus, the output of the model is:

Y＝softmax(η₁*F_CNN(S_r)+η₂*F_BERT(T))

Where F _CNN represents the output of the post-training CNN, F _BERT represents the output of the post-training BERT, η ₁ and η ₂ are weights associated with the model outputs, and Y is an n-dimensional vector. The Softmax function is used to convert the aggregate value for each class to values between 0 and 1, and these values can be interpreted as probabilities.

The present embodiment finds that the weighted aggregation method is more suitable for synthesizing and combining these different types of information. First, since the integration consists of only two base learners, a simple voting scheme is ineffective because any divergence between models can lead to ties. Second, weighted aggregation enables the present embodiment to take advantage of the unique advantages of each heterogeneous model, thereby providing more accurate and reliable predictions. According to the performance of each model or the correlation of the model and the task, different weights are distributed to the models, so that the final output is a fine combination of the input of each model.

Experiment

The present embodiment uses accident recording data of Chongqing urban rail transit from 2017, month 1 to 2018, month 6. The raw data includes time of occurrence, place of occurrence, detailed process of occurrence and consequences of the occurrence. Based on the incident data, this example lists eleven features related to the consequences of the incident in table 2. These features are selected as input variables for modeling the accident outcome classification prediction model.

TABLE 2 Accident characterization

The raw data is subjected to a preprocessing of sorting, normalization and variable classification before training the model using the raw data. After processing of the raw data 997 samples were obtained. The number of incidents of different incident types is shown in fig. 3 (1). The results showed that accident type 2 representing a failure of the vehicle equipment and facilities occurred 362 times, accounting for 36.3% of the total sample. Accident type 3, representing a communication signal equipment failure, occurred 259 times, accounting for about 27% of the total samples. Therefore, maintenance and repair staff of urban rail transit pay attention to maintenance of vehicle equipment and communication signal equipment, so that the occurrence frequency of rail traffic accidents can be effectively reduced. The accident type 7 is an accident caused by human factors, and 124 times of accidents occur, accounting for 12.4% of all accidents. These accidents are caused by unsafe behavior of passengers. Therefore, the safety consciousness propaganda and education activities are promoted and developed in passengers, and the accident probability caused by human causes can be effectively reduced.

Fig. 3 (2) shows the occurrence time of an urban rail transit accident, revealing that the accident mainly occurs in the early and late peak hours (7:00 to 9:00 and 17:00 to 19:00). Specifically, the number of incidents occurring during the early peak period is higher than the number during the late peak period. Furthermore, as can be seen from fig. 3 (2), the weekday accidents exhibit a distinct peak characteristic, while the distribution of the number of weekend accidents is relatively uniform. It can be seen that the operation time has a significant effect on the occurrence of urban rail traffic accidents. In order to improve the operation safety of urban rail transit, during the early and late peak hours (especially between 7:00 and 9:00 and 17:00 and 19:00), the measures for strengthening and preventing urban rail transit accidents are important. For example, the monitoring and inspection of stations and trains during peak hours is enhanced. Furthermore, maintenance of urban rail transit facilities should be enhanced during weekends and off-peak hours.

Fig. 4 shows various accident types occurring on different lines. The number of communication signal equipment faults of the line 1 is the largest, and the number of communication signal equipment faults of the line 1 is the largest. The lines No. 2, no. 3, and No. 6 frequently occur accidents related to malfunctions of the vehicle equipment and facilities. The wires 2 and 3 are prone to human effects, power equipment related problems and signal failures. Line 6 often encounters communication signaling equipment and vehicle equipment failures. Therefore, aiming at different lines, preventive measures can be taken in a targeted way, so that the occurrence frequency of urban rail traffic accidents can be effectively reduced. For example, signal equipment inspection and troubleshooting should be enhanced for line 1 and line 6 where communication signal equipment failure often occurs. For the lines No. 2 and No. 3 which are frequently generated by artificial influence accidents, power equipment problems and signal faults, the safety consciousness culture of passengers is enhanced, and the professional knowledge and skill of staff are improved.

Predictive performance assessment

As described above, there is an imbalance in the historical accident data for predicting the accident outcome of the urban rail transit system. Since AUC (area under the curve) is insensitive to unbalanced samples, this embodiment uses four metrics to evaluate the performance of the proposed model: true Positive Rate (TPR), false Positive Rate (FPR), receiver operating characteristic curve (ROC), and Area Under Curve (AUC).

The ROC curve is a graphical representation of the model performance, with the vertical axis representing TPR and the horizontal axis representing FPR. AUC represents the area under the ROC curve, ranging in value from 0.5 to 1. Higher AUC values indicate better predictive performance. The calculation of TPR and FPR is as follows:

TPR＝TP/(TP+FN)

FPR＝FP/(FP+TN)

Where TP represents the number of real cases (the model predicts as positive cases and is actually positive cases), FN represents the number of false negative cases (the model predicts as negative cases but is actually positive cases), FP represents the number of false positive cases (FP represents the number of false positive cases, TN represents the number of true negative cases), TN represents the number of true negative cases (FP represents the number of false positive cases, TN represents the number of true negative cases).

Prediction result

The consequences of an accident by a Chongqing city rail transit system are classified into 7 categories referring to the classification of the severity of the accident by the rail transit operation company, as shown in Table 3. And predicting accident results of the rail transit by using the proposed EMBC model according to the historical rail transit accident data.

TABLE 3 accident outcome category

Sequence number	Accident outcome category
		1	Train crossing mark
2	Passenger cleaning device
		3	Delay of train for 2-5 min
4	The train delays for 5 to 15 minutes
		5	The train delays for 15 to 30 minutes
6	Train off-line and stop operation
		7	Other accident conditions

To evaluate the effectiveness of the proposed EMBC integrated learning method in predicting accident outcomes, the present embodiment applies BERT, CNN, and EMBC to learn and predict accident outcomes. Fig. 5 (1), (2) and (3) show ROC curves for three models, respectively. Overall, the area under the ROC curve (AUC) of the CNN model was always higher than 0.70. However, in the predicted accident outcome category 1, its accuracy was low, AUC was 0.49. In contrast, CNNs perform better in predicting other accident outcome categories. BERT is less accurate than CNN in predicting accident outcome categories 3, 4 and 7. However, BERT performs beyond CNN when predicting other accident outcome categories. Compared with CNN and BERT, EMBC has higher precision, EMBC prediction results have wider coverage, and the accuracy curve is significantly higher than the standard diagonal. In summary, the prediction results show that EMBC is superior to BERT and CNN in terms of accuracy and coverage in predicting urban rail transit accident outcome classifications.

In EMBC ensemble learning, the present embodiment uses a voting method to weight sum the prediction results of BERT and CNN to generate the final prediction output result. The weight of the two model prediction results in the final result significantly affects the accuracy of the prediction. In this embodiment, the weights of the two models are continuously adjusted to determine the weights of the two models with the highest pre-accuracy.

The sum of the weights of BERT and CNN is 1, and fig. 6 shows the relationship between BERT weight and model prediction accuracy. Fig. 6 shows that the BERT weights have a significant impact on the accuracy of the prediction results of the ensemble learning model EMBC. As the BERT weight increases, the accuracy starts to rise, and as the BERT scale continues to increase, the prediction accuracy decreases. When the BERT weight reaches 0.64, the prediction accuracy stabilizes around 0.73. This finding shows that in EMBC, when the BERT weight is set to 0.64 and the cnn weight is set to 0.36, the proposed EMBC model predicts the best performance of the urban rail traffic accident outcome.

Comparing the performance of different models is critical to selecting the most accurate and appropriate model. In this study, the present example uses the same urban rail transit accident data set to evaluate the performance of the model. Wherein 80% of the data set was used for training and 20% of the data set was used for testing. Fig. 7 shows training and testing results for EMBC and other conventional machine learning models. FIG. 7 shows that the multi-layered perceptron model and the Bayesian network model have similar prediction accuracy. However, the support vector machine model has the lowest accuracy compared to other models. This difference in performance is mainly caused by the heterogeneity of the data. In contrast, the proposed EMBC model exhibited excellent predictive performance, reaching an accuracy of 0.829 on the test set. Compared with other traditional models, the model has obvious progress. The EMBC high accuracy rate provided by the embodiment is mainly because BERT is introduced into the model, semantic information in text data can be effectively processed, and challenges brought by data heterogeneity are solved.

Conclusion(s)

Under the support of big data technology and data analysis method, traffic accident prediction modeling research is continuously in depth. According to historical accident data of Chongqing rail transit, the embodiment analyzes accident distribution of different time periods and accident numbers of different routes. Based on the analysis results, measures and suggestions are made to reduce the impact of accidents and to reduce the frequency of occurrence of accidents. In order to analyze and predict the consequences of rail traffic accidents, the embodiment provides an integrated learning method (EMBC) based on BERT and CNN, and 10 characteristics such as accident types, accident occurrence time, accident places and the like are selected as input variables. To sufficiently solve the problem of heterogeneity of historical text data, the present embodiment applies BERT to learn the meaning of text data and improve learning ability, and CNN to capture spatial patterns in accident data. Finally, in the integrated learning part, the results of BERT and CNN are combined through a voting method, so that the generalization capability of the model is enhanced and the prediction precision is improved. Experimental results show that the EMBC method can effectively predict the accident result of urban rail transit, and the accuracy rate reaches 82.9%. Compared with traditional machine learning models such as a multi-layer perceptron, a support vector machine and Bayes, the prediction accuracy of the proposed EMBC model is at least 20 percent higher. The invention establishes an accurate and reliable prediction model, is beneficial to identifying potential risks and hidden dangers in advance, accurately predicts the consequences caused by accidents, and provides a new method for reducing the occurrence frequency of urban rail traffic accidents and weakening the influence of the accidents.

The invention and its embodiments have been described above by way of illustration and not limitation, and the invention is illustrated in the accompanying drawings and described in the drawings in which the actual structure is not limited thereto. Therefore, if one of ordinary skill in the art is informed by this disclosure, the structural mode and the embodiments similar to the technical scheme are not creatively designed without departing from the gist of the present invention.

Claims

1. The method for predicting the urban rail transit accident result based on the integrated learning model is characterized by comprising the following steps of: establishing an integrated learning model (EMBC) based on the convolutional neural network CNN and the BERT model; EMBC, CNN is used for capturing accident information of numerical value class, BERT is used for learning complex relations in accident text description, and a self-learning parameter-based Bagging method is used for aggregating classification results of BERT and CNN to obtain a final urban rail transit accident result prediction result;

In CNN, a fully connected layer with 7 units is used, then a softmax function is used to generate the probability of accident outcome prediction; the CNN is trained by using a binary cross entropy loss function, and is optimized by using an Adam optimization algorithm;

In BERT, in each transducer layer, relationships between different embedded representations are learned using a self-attention mechanism; and using multi-headed attention enables the model to focus on information in different marker subspaces at different locations simultaneously:

MultiHead(Q,K,V)＝Concat(head₁,…,head_n)W^O

Wherein Q, K, V represent query embedding, key value embedding and real value embedding respectively; Is a matrix of learnable parameters;

The output of the BERT comes from the top transducer encoder, [ CLS ] tag as a special symbol placed at the beginning of the text, with emphasis placed on the final representation of the [ CLS ] tag, which is at the beginning of the text, the aggregate sequence used as a classification task to represent its final expression, and then passing this [ CLS ] tag representation to the classification layer; BERT uses a softmax layer to classify categories of urban rail transit accident consequences.

2. The method for predicting urban rail transit accident consequences based on the ensemble learning model according to claim 1, characterized in that: EMBC during training, inputting into a triplet D (S, T, L), wherein S is a vector set of n×d, n represents the total number of records of accident statistics, and D represents the number of accident features; t represents an accident description string set; a set of tags representing the consequences of an accident;

to better represent the different categories of statistics, a single thermal encoding is used for each S _i and a vector is generated, denoted S _Oi; carrying out dimension reduction processing on the high-dimension data by using a principal component analysis method, marking the dimension after dimension reduction as beta, and taking the dimension as a radix superparameter in the model; the vector after the dimension reduction is denoted as S _ri; thus, the data processing and dimension reduction process is expressed as:

S_ri＝PCA_β(ONEH OT(S_i))

wherein ONEHOT functions are independent heat coding functions, PCA is a principal component analysis dimension reduction function.

3. The method for predicting urban rail transit accident consequences based on the ensemble learning model according to claim 2, characterized in that: the ensemble learning Bagging method selects a weighted aggregation method, and therefore, the output of EMBC is defined as:

Y＝softmax(η₁*F_CNN(S_r)+η₂*F_BERT(T)

Wherein F _CNN represents the output of post-training CNN, and F _BERT represents the output of post-training BERT; the symbols η ₁ and η ₂ are weights related to the model output; y is an n-dimensional vector; the Softmax function converts the aggregate value for each class to between 0 and 1.