KR101982753B1

KR101982753B1 - Apparatus and method for predicting degree of risk of train accident

Info

Publication number: KR101982753B1
Application number: KR1020170169956A
Authority: KR
Inventors: 김상수; 정지수; 이현경
Original assignee: (주)위세아이텍
Priority date: 2017-12-12
Filing date: 2017-12-12
Publication date: 2019-05-27

Abstract

The present invention relates to a method for predicting a train accident risk comprising the steps of: receiving new data to be a target of train accident risk prediction; selecting a risk prediction model for predicting a train accident risk for the new data based on a data set including train accident data and running performance data; and predicting a risk of a train accident on the new data based on the selected risk prediction model.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a railway accident risk prediction apparatus,

본원은 열차사고 위험도 예측 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for predicting the risk of a train accident.

열차사고(철도사고)는 한번의 사고로 큰 인명피해와 막대한 손실을 야기하는 중대한 사고 중 하나라 할 수 있다. 그런데, 종래에는 이러한 열차사고의 위험 분석 기술과 관련하여 그 개발 수준이 마땅치 않은 실정이다.Train accidents (railway accidents) are one of the major accidents that cause great loss of lives and massive loss in a single accident. However, in the past, the development level of the risk analysis technology of train accidents has not been satisfactory.

일예로, 종래에는 통계적 기법에 기초하여 사고의 위험을 분석하는 기술이 공지된 바 있다. 그런데, 통계적 기법 기반의 사고 위험 분석 기술을 통해서는 열차사고의 위험을 정확히 예측하는 데에 한계가 있다. For example, in the past, techniques for analyzing the risk of an accident based on statistical techniques have been known. However, there is a limit to precisely predict the risk of train accidents through statistical technique based accident risk analysis technology.

본원의 배경이 되는 기술은 한국등록특허공보 제10-1020191호에 개시되어 있다.The background technology of the present application is disclosed in Korean Patent Registration No. 10-1020191.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 열차사고의 위험을 보다 정확히 예측함으로써 열차사고의 위험을 사전에 효과적으로 예방할 수 있는 열차사고 위험도 예측 장치 및 방법을 제공하려는 것을 목적으로 한다.It is an object of the present invention to provide an apparatus and method for predicting the risk of a train accident which can effectively prevent a risk of a train accident in advance by more accurately predicting the risk of a train accident.

다만, 본원의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.It is to be understood, however, that the technical scope of the embodiments of the present invention is not limited to the above-described technical problems, and other technical problems may exist.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 제1 측면에 따른 열차사고 위험도 예측 방법은, 열차사고 위험도 예측의 대상이 되는 신규 데이터를 수신하는 단계; 열차사고 데이터 및 운행실적 데이터를 포함하는 데이터셋에 기초하여 상기 신규 데이터에 대한 열차사고 위험도 예측을 위한 위험예측 모델을 선정하는 단계; 및 상기 선정된 위험예측 모델에 기초하여 상기 신규 데이터에 대한 열차사고 위험도를 예측하는 단계를 포함할 수 있다.According to a first aspect of the present invention, there is provided a method for predicting a train accident risk, the method comprising: receiving new data to be predicted for a train accident risk; Selecting a risk prediction model for predicting a train accident risk for the new data based on a data set including train accident data and driving performance data; And predicting a risk of a train accident for the new data based on the selected risk prediction model.

또한, 본원의 제1 측면에 따른 열차사고 위험도 예측 방법은 상기 데이터셋에 저장된 데이터를 정규화하는 전처리를 수행하는 단계; 및 정규화된 상기 데이터에 기초하여 위험예측 모델을 생성하는 단계를 더 포함할 수 있다.According to a first aspect of the present invention, there is provided a method for estimating a risk of a train accident, comprising the steps of: performing pre-processing for normalizing data stored in the data set; And generating a risk prediction model based on the normalized data.

또한, 상기 전처리를 수행하는 단계는, 정규화된 상기 데이터를 기반으로 하여 상기 데이터셋 내의 레코드별 심각도를 산정하는 단계; 및 정규화된 상기 데이터를 기반으로 하여 위험예측 모델의 생성을 위한 적어도 하나의 변수를 선정하는 단계를 포함하고, 상기 위험예측 모델을 생성하는 단계는, 상기 심각도의 산정에 의한 위험수준 값 및 상기 선정된 변수를 이용하여 상기 위험예측 모델을 생성할 수 있다.The pre-processing may include: calculating a severity of each record in the data set based on the normalized data; And generating at least one parameter for generating a risk prediction model based on the normalized data, wherein the step of generating the risk prediction model comprises: calculating a risk level value by the calculation of the severity, The risk prediction model can be generated using the parameters.

또한, 상기 심각도를 산정하는 단계는, 사고 피해 규모 및 사고 빈도수를 고려하여 상기 심각도를 산정할 수 있다.Also, the step of calculating the severity may calculate the severity in consideration of the scale of the accident damage and the frequency of the accident.

또한, 상기 심각도를 산정하는 단계는, 지연시간을 더 고려하여 상기 심각도를 산정할 수 있다.Also, the step of calculating the severity may calculate the severity by considering the delay time.

또한, 본원의 제1 측면에 따른 열차사고 위험도 예측 방법은, 생성된 상기 위험예측 모델에 복수의 머신러닝 알고리즘을 적용함으로써 상기 위험예측 모델에 대한 학습을 수행하는 단계를 더 포함하고, 상기 위험예측 모델을 선정하는 단계는, 학습 결과에 기초하여 상기 복수의 머신러닝 알고리즘 중 가장 높은 정확도를 나타내는 머신러닝 알고리즘이 적용된 위험예측 모델을 상기 신규 데이터에 대한 열차사고 위험도 예측을 위한 위험예측 모델로서 선정할 수 있다.Further, the train accident risk prediction method according to the first aspect of the present invention further includes performing learning on the risk prediction model by applying a plurality of machine learning algorithms to the generated risk prediction model, Selecting a model may include selecting a risk prediction model to which a machine learning algorithm representing the highest accuracy among the plurality of machine learning algorithms is applied as a risk prediction model for predicting a train accident risk for the new data based on the learning results .

또한, 상기 복수의 머신러닝 알고리즘은 랜덤포레스트(random forest) 알고리즘, 서포트 벡터 머신(Support Vector Machine, SVM) 알고리즘 및 k-최근접이웃(k-Nearest Neighbors, KNN) 알고리즘 중 적어도 하나를 포함할 수 있다.The plurality of machine learning algorithms may also include at least one of a random forest algorithm, a support vector machine (SVM) algorithm, and a k-nearest neighbors (KNN) algorithm. have.

또한, 상기 예측하는 단계는, 상기 신규 데이터에 대한 열차사고 위험도를 정상, 주의, 경고 및 위험 중 어느 하나로 예측할 수 있다.In addition, the predicting step may predict the risk of a train accident with respect to the new data as one of normal, caution, warning, and danger.

또한, 상기 데이터셋은 복수의 레코드를 포함하고, 상기 레코드는 상기 열차사고 데이터 및 상기 운행실적 데이터를 운행일자, 편성번호 및 열차번호를 기준으로 연결함으로써 생성될 수 있다.Also, the data set includes a plurality of records, and the record may be generated by connecting the train accident data and the driving performance data on the basis of a date of operation, a combination number, and a train number.

본원의 제2 측면에 따른 열차사고 위험도 예측 장치는, 열차사고 위험도 예측의 대상이 되는 신규 데이터를 수신하는 수신부; 열차사고 데이터 및 운행실적 데이터를 포함하는 데이터셋에 기초하여 상기 신규 데이터에 대한 열차사고 위험도 예측을 위한 위험예측 모델을 선정하는 위험예측 모델 선정부; 및 상기 선정된 위험예측 모델에 기초하여 상기 신규 데이터에 대한 열차사고 위험도를 예측하는 예측부를 포함할 수 있다.According to a second aspect of the present invention, there is provided an apparatus for predicting a train accident risk, comprising: a receiver for receiving new data to be predicted for a train accident risk; A risk prediction model for selecting a risk prediction model for predicting a train accident risk for the new data based on a data set including train accident data and running performance data; And a predictor for predicting a risk of a train accident with respect to the new data based on the selected risk prediction model.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본원을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 추가적인 실시예가 존재할 수 있다.The above-described task solution is merely exemplary and should not be construed as limiting the present disclosure. In addition to the exemplary embodiments described above, there may be additional embodiments in the drawings and the detailed description of the invention.

전술한 본원의 과제 해결 수단에 의하면, 열차사고 데이터와 운행실적을 이용하여 생성된 위험예측 모델로 하여금 열차사고의 위험도를 예측할 수 있으며, 이로부터 열차사고의 위험을 사전에 효과적으로 예방할 수 있다.According to the above-mentioned problem solving means, the risk prediction model generated by using the train accident data and the operation performance can predict the risk of the train accident, and thereby it is possible to prevent the risk of the train accident effectively in advance.

다만, 본원에서 얻을 수 있는 효과는 상기된 바와 같은 효과들로 한정되지 않으며, 또 다른 효과들이 존재할 수 있다.However, the effects obtainable here are not limited to the effects as described above, and other effects may exist.

도 1은 본원의 제1 측면에 따른 열차사고 위험도 예측 장치의 개략적인 구성을 나타낸 도면이다.
도 2는 본원의 제1 측면에 따른 열차사고 위험도 예측 장치에서 변수의 선정 예를 나타낸 도면이다.
도 3은 본원의 제1 측면에 따른 열차사고 위험도 예측 장치에서 복수의 머신러닝 알고리즘이 적용된 위험예측 모델에 대한 열차사고 위험도 예측 결과를 나타낸 도면이다.
도 4는 본원의 제1 측면에 따른 열차사고 위험도 예측 장치에서 복수의 머신러닝 알고리즘 각각에 대응하는 복수의 위험예측 모델별 예측 정확도를 나타낸 도면이다.
도 5는 본원의 제1 측면에 따른 열차사고 위험도 예측 장치에서 실제 신규 데이터에 대하여 위험예측 모델이 적용됨에 따른 열차사고 위험도 예측 결과의 예를 나타낸 도면이다.
도 6은 본원의 제2 측면에 따른 철도사고 데이터 유형간 연관성 분석 장치의 구성을 개략적으로 나타낸 도면이다.
도 7은 본원의 제2 측면에 따른 철도사고 데이터 유형간 연관성 분석 장치에서 획득되는 철도사고 데이터의 예를 나타낸 도면이다.
도 8은 본원의 제2 측면에 따른 철도사고 데이터 유형간 연관성 분석 장치에서 SVD 기법을 적용하기 위해 철도사고 데이터를 2차원 행렬로 나타낸 도면이다.
도 9는 본원의 제2 측면에 따른 철도사고 데이터 유형간 연관성 분석 장치에서 철도사고 데이터의 유형 간의 연관성 분석 결과를 사용자가 이해하기 쉬운 형태로 디스플레이 한 예를 나타낸 도면이다.
도 10은 본원의 제3 측면에 따른 열차사고 위험도 예측 방법에 대한 동작 흐름도이다.
도 11은 본원의 제4 측면에 따른 철도사고 데이터 유형간 연관성 분석 방법에 대한 동작 흐름도이다.1 is a diagram showing a schematic configuration of a train accident risk prediction apparatus according to a first aspect of the present invention.
2 is a diagram showing an example of selection of a variable in the train accident risk prediction apparatus according to the first aspect of the present invention.
FIG. 3 is a diagram illustrating a result of predicting a train accident risk for a risk prediction model to which a plurality of machine learning algorithms are applied in the train accident risk prediction apparatus according to the first aspect of the present invention. FIG.
FIG. 4 is a diagram illustrating a plurality of predictive accuracy predictions for each of a plurality of machine learning algorithms in a train accident risk prediction apparatus according to the first aspect of the present invention.
FIG. 5 is a diagram showing an example of the result of predicting a train accident risk according to the application of a risk prediction model to actual new data in the train accident risk prediction apparatus according to the first aspect of the present invention.
FIG. 6 is a diagram schematically showing the configuration of a device for analyzing the relationship between types of railway accident data according to a second aspect of the present invention.
FIG. 7 is a view showing an example of railway accident data obtained by the apparatus for analyzing the correlation between types of railway accident data according to the second aspect of the present invention; FIG.
FIG. 8 is a diagram showing a railway accident data in a two-dimensional matrix in order to apply the SVD technique in the apparatus for analyzing associations between types of railway accident data according to a second aspect of the present invention.
FIG. 9 is a diagram showing an example in which the association analysis result between types of railway accident data is displayed in an easy-to-understand manner by a user in an apparatus for analyzing associations between railway accident data types according to a second aspect of the present invention.
10 is a flowchart of a method for predicting the risk of a train accident according to the third aspect of the present application.
11 is a flowchart illustrating a method of analyzing a correlation between types of railway accident data according to a fourth aspect of the present invention.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. It should be understood, however, that the present invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In the drawings, the same reference numbers are used throughout the specification to refer to the same or like parts.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결" 또는 "간접적으로 연결"되어 있는 경우도 포함한다. Throughout this specification, when an element is referred to as being "connected" to another element, it is intended to be understood that it is not only "directly connected" but also "electrically connected" or "indirectly connected" "Is included.

본원 명세서 전체에서, 어떤 부재가 다른 부재 "상에", "상부에", "상단에", "하에", "하부에", "하단에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.It will be appreciated that throughout the specification it will be understood that when a member is located on another member "top", "top", "under", "bottom" But also the case where there is another member between the two members as well as the case where they are in contact with each other.

본원 명세서 전체에서, 어떤 부분이 어떤 구성 요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Throughout this specification, when an element is referred to as " including " an element, it is understood that the element may include other elements as well, without departing from the other elements unless specifically stated otherwise.

도 1은 본원의 제1 측면에 따른 열차사고 위험도 예측 장치의 개략적인 구성을 나타낸 도면이다.1 is a diagram showing a schematic configuration of a train accident risk prediction apparatus according to a first aspect of the present invention.

도 1을 참조하면, 본원의 제1 측면에 따른 열차사고 위험도 예측 장치(10)는 수신부(11), 위험예측 모델 선정부(12) 및 예측부(13)를 포함할 수 있다.Referring to FIG. 1, the train accident risk prediction apparatus 10 according to the first aspect of the present invention may include a receiving unit 11, a risk prediction model selecting unit 12, and a predicting unit 13.

수신부(11)는 열차사고 위험도 예측의 대상이 되는 신규 데이터를 수신할 수 있다. 여기서 신규 데이터는 열차사고 데이터, 운행실적 데이터 중 적어도 하나와 관련된 열차 관련 데이터를 의미할 수 있다. 또한, 본원에서 열차는 철도를 의미할 수 있다.The receiving unit 11 can receive new data to be subjected to the train accident risk prediction. Here, the new data may mean train-related data related to at least one of train accident data and driving performance data. Also, the term " train "

위험예측 모델 선정부(12)는 열차사고 데이터 및 운행실적 데이터를 포함하는 데이터셋에 기초하여 신규 데이터에 대한 열차사고 위험도 예측을 위한 위험예측 모델을 선정할 수 있다. 본원에서 모델은 모형이라 달리 지칭될 수 있다.The risk prediction model selection unit 12 may select a risk prediction model for predicting the train accident risk for new data based on the data set including the train accident data and the running performance data. Models may be referred to herein as models.

여기서, 본원의 제1 측면에 따른 열차사고 위험도 예측 장치(10)는 데이터셋을 생성하는 데이터셋 생성부(미도시)를 포함할 수 있다. 데이터셋에는 열차사고 데이터 및 운행실적 데이터가 통합되어 저장될 수 있다. 이에 따르면, 데이터셋에 저장된 데이터는 운행실적에 따른 열차사고 데이터가 저장될 수 있다. 열차사고 데이터는 과거의 열차 사고 및 열차 장애와 관련된 데이터로서, 사고장애데이터, 철도사고 데이터 등으로 달리 지칭될 수 있다. 운행실적 데이터는 열차의 운행과 관련된 실적으로서, 열차운행 횟수, 열차운행 거리(예를 들어, 키로수) 등의 정보를 포함할 수 있으나, 이에만 한정되는 것은 아니다. Here, the train accident risk prediction apparatus 10 according to the first aspect of the present invention may include a data set generation unit (not shown) for generating a data set. The data set can contain integrated train accident data and operational performance data. According to this, the data stored in the data set can store the train accident data according to the running performance. Train accident data are data related to train accidents and train disturbances in the past, and may be referred to as accident trouble data, rail accident data, and the like. The operation performance data may include information on the number of train operations and the distance of train operation (for example, key number) as the performance related to the operation of the train, but the present invention is not limited thereto.

또한, 데이터셋은 복수의 레코드를 포함할 수 있다. 여기서, 레코드는 열차사고 데이터 및 운행실적 데이터를 운행일자, 편성번호 및 열차번호를 기준으로 연결함으로써 생성될 수 있다. 달리 말해, 본원에서는 열차사고 위험도(위험 수준)를 예측하기 위해, 열차사고 데이터 및 운행실적 데이터가 운행일자, 편성번호 및 열차번호를 기준으로 연결되어 데이터셋에 저장될 수 있다. 이때, 운행일자, 편성번호 및 열차번호를 기준으로 연결되어 데이터셋에 저장되는 하나의 데이터를 하나의 레코드라 할 수 있다.The data set may also include a plurality of records. Here, the record can be generated by connecting the train accident data and the operation performance data on the basis of the operation date, the combination number, and the train number. In other words, in order to predict the risk of train accidents (risk level), train accident data and train performance data can be stored in the dataset based on the date of operation, the train number, and the train number. At this time, one data stored in the data set connected with the travel date, train number and train number can be regarded as one record.

위험예측 모델(모형)의 선정을 위한 구체적인 설명은 다음과 같다.A detailed explanation for the selection of the risk prediction model (model) is as follows.

본원의 제1 측면에 따른 열차사고 위험도 예측 장치(10)는 전처리부(14), 위험예측 모델 생성부(15) 및 학습부(16)를 포함할 수 있다.The train accident risk prediction apparatus 10 according to the first aspect of the present invention may include a preprocessing unit 14, a risk prediction model generation unit 15, and a learning unit 16.

전처리부(14)는 데이터셋에 저장된 데이터를 정규화하는 전처리를 수행할 수 있다. 즉, 전처리부(14)는 데이터셋에 저장된 데이터의 변수 값들을 일정 기준으로 맞춰주는 데이터 정규화를 수행할 수 있다. 구체적으로, 전처리부(14)는 위험예측 모델의 안정성과 정확성을 향상시키고 데이터셋에 저장된 데이터가 갖는 오차를 효과적으로 줄이기 위해, 데이터셋 내의 데이터에서 일정하지 않은 부분, 즉 일정하지 않은 변수 값을 갖는 데이터를 MIN/MAX 방법으로 정규화할 수 있다. 전처리부(14)는 MIN/MAX 방법을 통한 정규화로 하여금, 데이터셋에 저장된 데이터의 변수 값을 최소값 0 내지 최대값 1 사이의 범위 내에서 어느 하나의 값을 갖도록 변환시킬 수 있다.The preprocessing unit 14 may perform preprocessing for normalizing the data stored in the data set. That is, the preprocessing unit 14 may perform data normalization to adjust the variable values of the data stored in the data set to a predetermined standard. In particular, the preprocessor 14 may be configured to have a non-uniform portion of the data in the data set, i.e., a non-constant value, to improve the stability and accuracy of the risk prediction model and effectively reduce the error of the data stored in the data set The data can be normalized by the MIN / MAX method. The preprocessing unit 14 may cause the normalization through the MIN / MAX method to convert the variable value of the data stored in the dataset to have a value within a range between the minimum value 0 and the maximum value 1.

또한, 전처리부(14)는 정규화된 데이터(즉, 데이터셋에 저장된 정규화된 데이터)를 기반으로 하여 데이터셋 내의 레코드별로 심각도를 산정할 수 있다. 이때, 레코드별 심각도 산정에 의하여 레코드별로 열차사고 위험수준 값이 도출(산출)될 수 있다. Also, the preprocessing unit 14 may calculate the severity of each record in the data set based on the normalized data (i.e., the normalized data stored in the data set). At this time, it is possible to derive (calculate) the value of train accident risk level per record by calculating the severity by record.

전처리부(14)는 사고 피해 규모 및 사고 빈도수를 고려하여 심각도를 산정할 수 있다. 특히, 전처리부(14)는 사고 피해 규모 및 사고 빈도수의 곱을 통해 심각도를 산정할 수 있다.The preprocessing unit 14 can calculate the severity considering the scale of the accident damage and the frequency of the accident. Particularly, the preprocessing unit 14 can calculate the severity through the multiplication of the accident damage scale and the accident frequency.

다시 말해, 심각도는 사고 빈도수와 사고 피해 규모를 고려하여 산정될 수 있다. 이때, 전처리부(14)는 사고 빈도수와 각 사고의 피해 규모(사고 피해 규모)를 곱함으로써 데이터셋 내의 레코드별로 심각도를 산정할 수 있다. 여기서, 사고 피해 규모는 열차사고시 발생한 물적피해 규모에 의하여 산출될 수 있다. 또한 사고 피해 규모는 열차사고시 발생한 물적피해 규모에 영업피해 규모를 합산함으로써 산출될 수 있다. 즉, 사고 피해 규모는 열차사고시 발생한 물적피해 규모 외에 추가적으로 영업피해 규모를 고려함으로써 산출될 수 있다. In other words, the severity can be estimated by taking into account the frequency of accidents and the scale of accident damage. At this time, the preprocessing unit 14 can calculate the severity of each record in the data set by multiplying the frequency of the accident and the damage scale of each accident (accident damage scale). Here, the magnitude of the accident damage can be calculated by the amount of damage caused in case of a train accident. In addition, the scale of accident damage can be calculated by sum- ming the amount of damage caused by trains in case of a train accident, In other words, the scale of accident damage can be calculated by taking into account the amount of damage caused by trains in addition to the amount of damage caused by trains.

또한, 전처리부(14)는 사고 피해 규모, 사고 빈도수 외에 지연시간을 더 고려하여 심각도를 산정할 수 있다. 여기서, 지연시간이라 함은 열차사고에 의한 열차 지연시간을 의미할 수 있다. In addition, the preprocessing unit 14 can calculate the severity by taking into account the magnitude of the accident, the frequency of the accident, and the delay time. Here, the delay time may mean a train delay time due to a train accident.

또한, 전처리부(14)는 정규화된 데이터(즉, 데이터셋에 저장된 정규화된 데이터)를 기반으로 하여 위험예측 모델의 생성을 위한 적어도 하나의 변수를 선정할 수 있다. 달리 표현하여, 전처리부(14)는 정규화된 데이터를 기반으로 하여 열차사고 위험도 예측에 필요한 변수(중요 변수)를 선정할 수 있다. 이러한 변수의 선정은 위험예측 모델(모형)의 생성시 위험예측 모델에 영향력 있는 변수만을 사용하기 위해 이루어질 수 있다.Also, the preprocessing unit 14 may select at least one variable for generation of the risk prediction model based on the normalized data (i.e., the normalized data stored in the data set). In other words, the preprocessing unit 14 can select the parameters (important variables) necessary for predicting the railway accident risk based on the normalized data. The selection of these variables can be made to use only the variables that influence the risk prediction model in generating the risk prediction model (model).

도 2는 본원의 제1 측면에 따른 열차사고 위험도 예측 장치에서 변수의 선정 예를 나타낸 도면이다.2 is a diagram showing an example of selection of a variable in the train accident risk prediction apparatus according to the first aspect of the present invention.

도 2를 참조하면, 데이터셋에는 정규화된 데이터로서, 일예로 rf_1C2015 변수와 관련하여 D232, D431, D121, D145, D117, D112 등의 복수의 변수가 포함될 수 있으며, 복수의 변수 각각에 대한 중요도는 일예로 도 2와 같이 표현될 수 있다. 이때, 변수에 대한 중요도는 가로축 값인 MeanDecreaseGini(불확실성)의 값이 클수록 중요도가 높은 것을 의미할 수 있다.Referring to FIG. 2, a plurality of variables such as D232, D431, D121, D145, D117, and D112 may be included in the data set as normalized data, for example, with respect to the variable rf_1C2015. For example, as shown in FIG. In this case, the significance of the variable may mean that the larger the value of MeanDecreaseGini (uncertainty), the horizontal axis value, the higher the importance.

전처리부(14)는 rf_1C2015 변수와 관련하여 열차사고 위험도 예측에 필요한 변수(중요)로서 일예로 D232를 선정할 수 있다. 다만, 이에만 한정되는 것은 아니고, 선정되는 변수의 수의 설정은 사용자 입력에 기초하여 다양하게 설정될 수 있다.The preprocessing unit 14 can select D232 as an example (important) for predicting the railway accident risk in relation to the rf_1C2015 variable. However, the present invention is not limited thereto, and the setting of the number of variables to be selected can be variously set based on user input.

위험예측 모델 생성부(15)는 정규화된 데이터에 기초하여 위험예측 모델(모형)을 생성할 수 있다. 구체적으로, 위험예측 모델 생성부(15)는 정규화된 데이터를 기반으로 한 심각도의 산정에 의한 위험수준 값 및 선정된 변수를 이용하여 위험예측 모델을 생성할 수 있다. 즉, 위험예측 모델 생성부(15)는 심각도 산정에 의하여 산출된 레코드별 열차사고 위험수준 값과 열차사고 위험도 예측을 위해 선정된 변수를 이용하여 위험예측 모델을 생성할 수 있다. 이때, 위험예측 모델 생성부(15)는 열차사고 위험수준 값을 종속변수로 설정하고 선정된 변수를 독립변수로 설정함으로써 위험예측 모델을 생성할 수 있다. The risk prediction model generation unit 15 can generate a risk prediction model (model) based on the normalized data. Specifically, the risk prediction model generation unit 15 can generate a risk prediction model using the risk level value and the selected variable based on the calculation of the severity based on the normalized data. That is, the risk prediction model generation unit 15 can generate a risk prediction model using the values of the train accident risk level calculated by the severity calculation and the parameters selected for predicting the train accident risk. At this time, the risk prediction model generation unit 15 can generate a risk prediction model by setting the train accident risk level value as a dependent variable and the selected variable as an independent variable.

학습부(16)는 위험예측 모델 생성부(15)에 의하여 생성된 위험예측 모델에 복수의 머신러닝 알고리즘을 적용함으로써, 생성된 위험예측 모델에 대한 학습을 수행할 수 있다. 달리 말해, 위험예측 모델 생성부(15)에 의하여 생성된 위험예측 모델은 복수의 머신러닝 알고리즘을 적용함으로써 학습될 수 있다. The learning unit 16 can perform learning on the generated risk prediction model by applying a plurality of machine learning algorithms to the risk prediction model generated by the risk prediction model generation unit 15. [ In other words, the risk prediction model generated by the risk prediction model generation unit 15 can be learned by applying a plurality of machine learning algorithms.

이때, 위험예측 모델 생성부(15)에 의하여 생성된 위험예측 모델이 복수의 머신러닝 알고리즘 각각에 적용됨으로써, 복수의 머신러닝 알고리즘 각각에 대응하여 열차사고 위험도(위험 수준)가 산출될 수 있다. 달리 표현하여, 생성된 위험예측 모델이 복수의 머신러닝 알고리즘 각각에 적용됨으로써, 복수의 머신러닝 알고리즘 각각에 대응하여 머신러닝 알고리즘의 적용 결과로서 열차사고 위험 수준을 판단할 수 있다. At this time, the risk prediction model generated by the risk prediction model generation unit 15 is applied to each of the plurality of machine learning algorithms, so that a train accident risk (risk level) can be calculated corresponding to each of the plurality of machine learning algorithms. In other words, the generated risk prediction model is applied to each of the plurality of machine learning algorithms, so that the train accident risk level can be determined as a result of applying the machine learning algorithm corresponding to each of the plurality of machine learning algorithms.

여기서, 복수의 머신러닝 알고리즘은 랜덤포레스트(random forest) 알고리즘, 서포트 벡터 머신(Support Vector Machine, SVM) 알고리즘 및 k-최근접이웃(k-Nearest Neighbors, KNN) 알고리즘 중 적어도 하나를 포함할 수 있다. 다만, 이에만 한정되는 것은 아니고 다양한 머신러닝 알고리즘이 적용될 수 있다.Here, the plurality of machine learning algorithms may include at least one of a random forest algorithm, a support vector machine (SVM) algorithm, and a k-nearest neighbors (KNN) algorithm . However, the present invention is not limited thereto, and various machine learning algorithms can be applied.

위험예측 모델 선정부(12)는 학습부(16)를 통한 학습 결과에 기초하여, 복수의 머신러닝 알고리즘 중 가장 높은 정확도를 나타내는 머신러닝 알고리즘이 적용된 위험예측 모델(모형)을 신규 데이터에 대한 열차사고 위험도 예측을 위한 위험예측 모델(모형)로서 선정할 수 있다. 달리 표현하여, 위험예측 모델 선정부(12)는 생성된 위험예측 모델에 복수의 머신러닝 알고리즘 각각을 적용한 학습 결과에 기초하여, 복수의 머신러닝 알고리즘 중 정확도가 가장 높은 머신러닝 알고리즘이 적용된 위험예측 모델을 신규 데이터에 대한 열차사고 위험도 예측을 위해 선정할 수 있다.The risk prediction model selection section 12 selects a risk prediction model (model) to which a machine learning algorithm representing the highest accuracy among a plurality of machine learning algorithms is applied based on the learning result through the learning section 16, It can be selected as a risk prediction model (model) for predicting accident risk. In other words, the risk prediction model selection unit 12 selects, based on the learning results obtained by applying each of the plurality of machine learning algorithms to the generated risk prediction model, the risk prediction model using the machine learning algorithm with the highest accuracy among the plurality of machine learning algorithms The model can be selected for predicting train accident risk for new data.

일예로, 위험예측 모델 생성부(15)는 위험예측 모델 생성부(15)에 의하여 생성된 위험예측 모델을 복수의 머신러닝 알고리즘 각각에 적용함에 따라, 복수의 머신러닝 알고리즘 각각에 대응하여 복수의 위험예측 모델을 생성할 수 있다. 이후, 학습부(16)는 복수의 위험예측 모델 각각에 대하여 학습을 수행할 수 있다. 이후, 위험예측 모델 선정부(12)는 학습 결과에 기초하여 복수의 위험예측 모델 중 정확도가 가장 높은 위험예측 모델을 신규 데이터에 대한 열차사고 위험도 예측을 위한 위험예측 모델로서 선정할 수 있다.For example, the risk prediction model generation unit 15 applies the risk prediction model generated by the risk prediction model generation unit 15 to each of the plurality of machine learning algorithms, so that the plurality of machine learning algorithms A risk prediction model can be generated. Thereafter, the learning unit 16 can perform learning for each of the plurality of risk prediction models. Thereafter, the risk prediction model selection unit 12 can select a risk prediction model having the highest accuracy among a plurality of risk prediction models as a risk prediction model for predicting a train accident risk based on learning results.

도 3은 본원의 제1 측면에 따른 열차사고 위험도 예측 장치에서 복수의 머신러닝 알고리즘이 적용된 위험예측 모델에 대한 열차사고 위험도 예측 결과를 나타낸 도면이다. FIG. 3 is a diagram illustrating a result of predicting a train accident risk for a risk prediction model to which a plurality of machine learning algorithms are applied in the train accident risk prediction apparatus according to the first aspect of the present invention. FIG.

도 3을 참조하면, (a)는 내지 (c)은 컨퓨전 매트릭스(confusion matrix)를 통해 복수의 머신러닝 알고리즘인 랜덤포레스트, SVM 및 KNN 각각이 적용된 위험예측 모델에 대한 성능 평가 결과를 나타낸다. 여기서, (a) 내지 (c) 각각에서 세로 축은 True label의 값을 나타내고, predicted label의 값을 나타낸다. 이에 따르면, 위험예측 모델 선정부(12)는 컨퓨전 매트릭스(confusion matrix)를 통해 복수의 머신러닝 알고리즘 중 가장 높은 정확도를 나타내는 머신러닝 알고리즘이 적용된 위험예측 모델을 신규 데이터에 대한 열차사고 위험도 예측을 위한 위험예측 모델로서 선정할 수 있다.Referring to FIG. 3, (a) to (c) show performance evaluation results of a risk prediction model to which a plurality of machine learning algorithms, random forest, SVM, and KNN, respectively, are applied through a confusion matrix. Here, in each of (a) to (c), the vertical axis represents the value of the true label and represents the value of the predicted label. According to this, the risk prediction model selection unit 12 predicts a risk prediction model for a new data by applying a machine learning algorithm that represents the highest accuracy among a plurality of machine learning algorithms through a confusion matrix As a risk prediction model.

도 4는 본원의 제1 측면에 따른 열차사고 위험도 예측 장치에서 복수의 머신러닝 알고리즘 각각에 대응하는 복수의 위험예측 모델(모형)별 예측 정확도를 나타낸 도면이다.FIG. 4 is a diagram illustrating prediction accuracy of a plurality of risk prediction models (models) corresponding to each of a plurality of machine learning algorithms in the train accident risk prediction apparatus according to the first aspect of the present invention.

도 4를 참조하면, 일예로, 랜덤포레스트 알고리즘이 적용된 위험예측 모델(모형)의 정확도는 60%, SVM 알고리즘이 적용된 위험예측 모델의 정확도는 53%, KNN 알고리즘이 적용된 위험예측 모델의 정확도는 52%로 나타날 수 있다. 이러한 경우, 위험예측 모델 선정부(12)는 복수의 위험예측 모델 중 랜덤포레스트 알고리즘이 적용된 위험예측 모델을 신규 데이터에 대한 열차사고 위험도 예측을 위한 위험예측 모델로서 선정할 수 있다. Referring to FIG. 4, for example, the accuracy of the risk prediction model (model) using the random forest algorithm is 60%, the accuracy of the risk prediction model using the SVM algorithm is 53%, the accuracy of the risk prediction model using the KNN algorithm is 52 %. &Lt; / RTI > In this case, the risk prediction model selection unit 12 can select a risk prediction model to which a random forest algorithm is applied among a plurality of risk prediction models as a risk prediction model for predicting a train accident risk for new data.

예측부(13)는 위험예측 모델 선정부(12)에서 선정된 위험예측 모델에 기초하여 신규 데이터에 대한 열차사고 위험도(위험 수준)를 예측할 수 있다. 즉, 신규 데이터에 대한 열차사고 위험도의 예측은 복수의 머신러닝 알고리즘에 대응하는 복수의 위험예측 모델 중 정확도가 가장 높은 위험예측 모델을 기반으로 이루어질 수 있다.The predicting unit 13 can predict the risk of train accident (risk level) for new data based on the risk prediction model selected by the risk prediction model selecting unit 12. [ That is, the prediction of the train accident risk for new data can be performed based on the risk prediction model having the highest accuracy among the plurality of risk prediction models corresponding to the plurality of machine learning algorithms.

또한, 예측부(13)는 신규 데이터에 대한 열차사고 위험도(위험 수준)를 정상, 주의, 경고 및 위험 중 어느 하나로 예측할 수 있다. 구체적으로, 예측부(13)는 위험예측 모델에 기초한 신규 데이터의 열차사고 위험도 값이 1로 산출된 경우 열차사고 위험도(위험 수준)가 '정상'인 것으로 예측할 수 있다. 또한, 예측부(13)는 위험예측 모델에 기초한 신규 데이터의 열차사고 위험도 값이 2로 산출된 경우 열차사고 위험도(위험 수준)가 '주위'인 것으로 예측할 수 있다. 또한, 예측부(13)는 위험예측 모델에 기초한 신규 데이터의 열차사고 위험도 값이 3으로 산출된 경우 열차사고 위험도(위험 수준)가 '경고'인 것으로 예측할 수 있다. 또한, 예측부(13)는 위험예측 모델에 기초한 신규 데이터의 열차사고 위험도 값이 4로 산출된 경우 열차사고 위험도(위험 수준)가 '위험'인 것으로 예측할 수 있다.Further, the predicting unit 13 can predict the risk (risk level) of the train accident for the new data to one of normal, caution, warning and danger. Specifically, the predicting unit 13 can predict that the train accident risk (risk level) is 'normal' when the risk value of the train accident risk of the new data based on the risk prediction model is calculated as 1. [ In addition, the predicting unit 13 can predict that the risk of train accident (risk level) is 'around' when the risk value of train accident risk of new data based on the risk prediction model is calculated as 2. [ In addition, the predicting unit 13 can predict that the train accident risk (risk level) is 'warning' when the risk value of the train accident risk of the new data based on the risk prediction model is calculated as 3. [ In addition, the predicting unit 13 can predict that the risk of train accident (risk level) is 'dangerous' when the risk value of train accident risk value of new data based on the risk prediction model is calculated as 4. [

도 5는 본원의 제1 측면에 따른 열차사고 위험도 예측 장치에서 실제 신규 데이터에 대하여 위험예측 모델이 적용됨에 따른 열차사고 위험도 예측 결과의 예를 나타낸 도면이다.FIG. 5 is a diagram showing an example of the result of predicting a train accident risk according to the application of a risk prediction model to actual new data in the train accident risk prediction apparatus according to the first aspect of the present invention.

도 5를 참조하면, 일예로 데이터셋에는 신규 데이터와 관련하여 운행일자와 편성번호가 연결(연관)되어 하나의 데이터로서 저장되어 있을 수 있다. 일예로 데이터셋에 저장된 복수의 레코드 중 제1 레코드에는 일자ID와 관련하여 '20160828'가 저장되어 있고, 편성번호와 관련하여 '1C2004'가 저장되어 있을 수 있다. 이때, 도 5를 참조한 예에서는 제1 레코드를 위험예측 모델 선정부(12)에서 선정된 위험예측 모델에 적용한 결과, 제1 레코드에 대한 열차사고 위험도 값(predicitions_rf_1)이 '1'로 산출(도출)되었으므로, 예측부(13)는 제1 레코드에 대응하는 신규 데이터에 대한 열차사고 위험도(위험 수준)를 '정상'인 것으로 예측할 수 있다.Referring to FIG. 5, for example, a date and a combination number associated with new data may be associated with each other in a data set and stored as one piece of data. For example, '20160828' is stored in the first record among the plurality of records stored in the data set in association with the date ID, and '1C2004' may be stored in association with the combination number. 5, the first record is applied to the risk prediction model selected by the risk prediction model selecting unit 12, and as a result, the train accident risk value (predicitions_rf_1) for the first record is calculated as 1 ), The predicting unit 13 can predict that the risk of a train accident (risk level) for new data corresponding to the first record is 'normal'.

이러한 본원의 제1 측면에 따른 열차사고 위험도 예측 장치(10)는, 운행실적에 따른 열차사고 데이터를 포함(즉, 열차사고 데이터와 운행실적 데이터를 포함)하는 데이터셋을 생성하고, 신규 데이터에 대한 위험도(위험 수준) 예측의 오차를 줄이기 위해 데이터셋에 대하여 정규화를 수행하고, 데이터셋에 기초하여 열차사고의 위험도(위험 수준)를 결정하는 심각도를 산정하고 위험예측 모델의 생성을 위한 중요 변수를 선정하며, 선정된 중요 변수와 산정된 심각도를 고려하여 위험예측 모델을 생성함으로써, 생성된 위험예측 모델에 기초하여 신규 데이터에 대한 위험도(위험 수준)를 예측할 수 있다. 즉, 본원은 열차사고 데이터와 운행실적을 이용하여 생성된 위험예측 모델로 하여금 열차사고의 위험도를 예측할 수 있으며, 이로부터 열차사고의 위험을 사전에 효과적으로 예방할 수 있다.The train accident risk prediction apparatus 10 according to the first aspect of the present invention generates a data set including train accident data according to the running performance (that is, including the train accident data and the running performance data) In order to reduce the error of the risk (risk level) prediction, the data set is normalized, the severity of the risk (risk level) of the train accident is determined based on the data set, (Risk level) for new data can be predicted based on the generated risk prediction model by generating a risk prediction model by taking into consideration the selected important variables and the calculated severity. In other words, we can predict the risk of train accidents by using the risk prediction model created by using the train accident data and operation results, and thereby prevent the risk of train accidents effectively in advance.

또한, 본원은 열차사고 데이터와 운행실적 데이터를 연계함으로써 데이터셋 내에 빅데이터로 구성하여 저장할 수 있다. 또한, 본원은 편성번호 및 열차번호를 기준으로 하여 머신러닝 알고리즘으로 하여금 열차 사고 위험(즉, 열차사고의 위험도)을 예측할 수 있다. 다시 말해, 본원은 빅데이터 기반의 머신러닝 알고리즘을 이용하여 열차사고의 위험도를 예측함으로써, 보다 정확하고 향상된 예측률을 달성할 수 있다.In addition, the present invention can construct and store big data in the data set by linking the train accident data and the running performance data. In addition, we can predict the risk of train accidents (that is, the risk of train accidents) by using the machine learning algorithm based on the combination number and the train number. In other words, we can achieve a more accurate and improved prediction rate by predicting the risk of train accidents using Big Data based machine learning algorithms.

이하에서는 본원의 제2 측면에 따른 열차사고 데이터 유형간 연관성 분석 장치에 대하여 기술하기로 한다. 여기서, 열차사고 데이터는 앞서 설명한 데이터셋에 포함되는 열차사고 데이터를 의미하는 것으로서, 이하에서는 철도사고 데이터라 달리 지칭될 수 있다.Hereinafter, an apparatus for analyzing a correlation between train accident data types according to a second aspect of the present invention will be described. Here, the train accident data refers to train accident data included in the data set described above, and may be hereinafter referred to as railway accident data.

즉, 본원의 제2 측면에 따른 철도사고 데이터 유형간 연관성 분석 기술은, 철도사고 데이터의 유형 간에 연관성을 분석하기 위하여, 철도사고 데이터의 2차원 행렬을 구성하는 방법과 그 행렬에 특이값 분해법(Single Value Decomposition, SVD)을 적용함으로써 산출된 연관도 값을 디스플레이하는 기술에 관한 것이다.In other words, the association analysis technique between the types of railway accident data according to the second aspect of the present invention is a method of constructing a two-dimensional matrix of railway accident data and analyzing the matrix of the singular value decomposition method The present invention relates to a technique for displaying an association value calculated by applying a single value decomposition (SVD).

도 6은 본원의 제2 측면에 따른 철도사고 데이터(열차사고 데이터) 유형간 연관성 분석 장치(100)의 구성을 개략적으로 나타낸 도면이다.FIG. 6 is a diagram schematically showing the configuration of a correlation analysis apparatus 100 between types of railway accident data (train accident data) according to a second aspect of the present invention.

도 6을 참조하면, 본원의 제2 측면에 따른 철도사고 데이터 유형간 연관성 분석 시스템(100)은 프로세서(110)를 포함할 수 있고, 프로세서(110)는 재정의부(120), 행렬 생성부(130), 분석부(140) 및 제어부(150)를 포함할 수 있다.Referring to FIG. 6, the correlation analysis system 100 of a railway accident data type according to the second aspect of the present invention may include a processor 110, and the processor 110 may include a redefinition unit 120, 130, an analysis unit 140, and a control unit 150.

프로세서(110)는 재정의부(120), 행렬 생성부(130), 분석부(140) 및 제어부(150)의 동작을 제어할 수 있다.The processor 110 may control operations of the redefinition unit 120, the matrix generation unit 130, the analysis unit 140, and the control unit 150.

재정의부(120)는 획득된 철도사고 데이터를 재정의할 수 있다. 이를 설명하기에 앞서 먼저 획득된 철도사고 데이터의 예를 살펴보면 다음과 같다.The redefinition unit 120 can redefine the acquired railway accident data. Before describing this, an example of the railway accident data obtained is as follows.

도 7은 본원의 제2 측면에 따른 철도사고 데이터 유형간 연관성 분석 장치에서 획득되는 철도사고 데이터의 예를 나타낸 도면이다.FIG. 7 is a view showing an example of railway accident data obtained by the apparatus for analyzing the correlation between types of railway accident data according to the second aspect of the present invention; FIG.

도 7을 참조하면, 철도사고 데이터는 데이터 구성 항목이 사고/장애 종별, 발생일시, 날씨, 관계소속, 철도종별 및 발생장소 등으로 구분될 수 있다.Referring to FIG. 7, the railway accident data can be classified into an accident / disability category, a date and time of occurrence, a weather, a relationship affiliation, a railroad category, and a place of occurrence.

보다 구체적으로 사고/장애 종별에 관한 정보는, 차량고장 등을 포함할 수 있고, 발생일시에 관한 정보는 년, 월, 일, 시각, 요일 등의 정보를 포함할 수 있으며, 날씨에 관한 정보는 눈, 구름, 맑음 등의 날씨 및 기온 정보를 포함할 수 있다. 또한, 관계소속 관련 정보는 서울, 전북 등의 소속 정보를 포함할 수 있고, 철도종별 관련 정보는 차량유형에 관한 정보로서 고속철도, 일반철도, 도시철도 등의 정보를 포함할 수 있으며, 발생장소 관련 정보는 노선, 역구간, 도/광역시, 시/군/구 등의 정보를 포함할 수 있다.More specifically, the information on the type of accident / disability may include a vehicle malfunction, and the information on the date and time may include information such as year, month, day, time, day of the week, And weather and temperature information such as snow, clouds, sunshine, and the like. In addition, the related information of the affiliated affiliation can include information belonging to Seoul, Jeonbuk, etc., and the information related to the railroad type can include information such as a high speed railroad, a general railroad, an urban railroad, The information may include information such as route, reverse section, road / metropolitan area, city / county / district, and the like.

재정의부(120)는 철도사고 데이터의 유형 간에 연관성을 분석하기 위해, 일예로 도 7과 같이 획득된 철도사고 데이터를 재정의(redefinition) 할 수 있다.The redefining unit 120 may redefine the railway accident data obtained as shown in FIG. 7, for example, to analyze the correlation between types of railway accident data.

보다 자세히 살펴보면, 재정의부(120)는 철도사고 데이터에 대한 행의 개수를 사고 건수로 재정의할 수 있다. 또한, 재정의부(120)는 철도사고 데이터에서 머리행(예를 들어, 사고/장애 종별1, 사고/장애 종별2, 년, 월, 일, 시각, 요일, 날씨, 기온, 발생소속, 고속/일반/도시, 노선, 역구간 등을 포함하는 행) 부분을 관점(즉, 사물과 현상에 대한 견해를 규정하는 사고의 기본 출발점)으로서 재정의하고, 각 항목들에 속한 내용들을 관점에 속한 구성원으로 재정의할 수 있다. 예를 들어, 노선은 관점일 수 있고, 경부선, 호남선 등은 구성원일 수 있다.In more detail, the redefinition unit 120 can redefine the number of rows for the railway accident data as the number of accidents. In addition, the redefinition unit 120 may determine whether or not an accident / disability type 1, an accident / disability type 2, year, month, day, time, day, weather, temperature, (Ie, the starting point for thinking that defines viewpoints on things and phenomena), and that the contents belonging to each item should be regarded as members belonging to the point of view You can redefine it. For example, the route may be a viewpoint, and the Gyeongbu line, Honam line, and the like may be members.

행렬 생성부(130)는 획득된 철도사고 데이터 내의 데이터 구성 항목들 중 일부 항목을 이용하여 2차원 행렬을 생성할 수 있다. 달리 표현하여, 행렬 생성부(130)는 재정의부(120)를 통해 재정의된 철도사고 데이터에 기초하여 2차원 행렬을 생성할 수 있다.The matrix generation unit 130 may generate a two-dimensional matrix using some of the data configuration items in the obtained railway accident data. Alternatively, the matrix generation unit 130 may generate a two-dimensional matrix based on the railway accident data redefined through the redefinition unit 120.

행렬 생성부(130)는 피벗(pivot) 기능을 통해 2차원 행렬을 생성할 수 있으며, 피벗 기능은 엑셀(MS Excel)의 피벗 테이블 또는 다차원 분석(On-Line Analytical Processing) 도구에서 제공하는 피벗 그리드 중 적어도 어느 하나를 포함할 수 있다. 피벗 기능은 데이터 분석 기법 중 하나로서, 상기에 기술한 예는 하나의 실시예일 뿐 이에 한정된 것은 아니고, 다른 종류의 분석 기법이 적용될 수도 있다.The matrix generation unit 130 may generate a two-dimensional matrix through a pivot function. The pivot function may be a pivot table provided by an Excel (MS Excel) or an on-line analytical processing tool Or the like. The pivot function is one of the data analysis techniques. The example described above is only one example, but the present invention is not limited thereto, and other kinds of analysis techniques may be applied.

행렬 생성부(130)를 통해 생성된 2차원 행렬의 예는 도 8과 같을 수 있다.An example of the two-dimensional matrix generated through the matrix generation unit 130 may be as shown in FIG.

도 8은 본원의 제2 측면에 따른 철도사고 데이터 유형간 연관성 분석 장치에서 SVD 기법을 적용하기 위해 철도사고 데이터를 2차원 행렬로 나타낸 도면이다.FIG. 8 is a diagram showing a railway accident data in a two-dimensional matrix in order to apply the SVD technique in the apparatus for analyzing associations between types of railway accident data according to a second aspect of the present invention.

도 8을 참조하면, 일예로 행렬 생성부(130)는 철도종별 항목에 속한 데이터 구성원(고속철도, 일반철도, 도시철도)과 노선 항목에 속한 데이터 구성원(경부선, 경원선, 경전선, 경춘선 등)에 기초하여 2차원 행렬을 생성할 수 있다. 일예로, 도 3의 2차원 행렬에 기초하면, 경부선에서 고속철도의 사고건수는 5건, 경부선에서 도시철도의 사고건수는 11건, 경부선에서 일반철도의 사고건수는 9건, 경원선에서 도시철도의 사고건수는 10건 등일 수 있다.Referring to FIG. 8, for example, the matrix generation unit 130 generates a matrix of data items (high-speed rail, general rail, and urban rail) belonging to the railroad type item and data members (Gyeongbu line, Gyeongwon line, Kyungpun line, A two-dimensional matrix can be generated. For example, based on the two-dimensional matrix of FIG. 3, the number of accidents on the high-speed railway on the Gyeongbu line is 5, the number of accidents on the urban railway on the Gyeongbu line is 11, the number of accidents on the general railway on the Gyeongbu line is 9, The number of accidents can be 10, etc.

분석부(140)는 행렬 생성부(130)에서 생성된 2차원 행렬의 각 항목에 특이값 분해법(Single Value Decomposition, SVD)을 적용함으로써, 2차원 행렬에 대응하는 데이터 유형 간의 연관성을 분석할 수 있다.The analysis unit 140 can analyze the association between data types corresponding to the two-dimensional matrix by applying a single value decomposition (SVD) to each item of the two-dimensional matrix generated by the matrix generation unit 130 have.

특이값 분해법(Single Value Decomposition, SVD)은 하나의 행렬을 여러 개의 작은 행렬로 분해하는 수학적인 방법으로서, 관련 분야에서는 잘 알려진 기술이므로, 이하 생략하기로 한다.Single Value Decomposition (SVD) is a mathematical method for decomposing a matrix into a plurality of small matrices, and is well known in the related art, and therefore will not be described below.

일예로, 분석부(140)는 도 8과 같은 2차원 행렬의 각 항목에 SVD를 적용할 경우, 철도 사고 데이터의 유형 간 연관성으로서 차량 유형(철도종별)과 노선 간의 연관성을 분석할 수 있다. 이외에도, 분석부(140)는 철도종별 항목과 사고/장애종별 항목에 기초한 2차원 행렬을 통해서는 차량유형과 사고/장애 유형 간의 연관성을 분석할 수 있다. 이처럼, 분석부(140)는 철고사고 데이터의 개별 항목 간의 연관성이 아닌, 다른 두 유형 간의 연관성을 분석할 수 있다.For example, when the SVD is applied to each item of the two-dimensional matrix as shown in FIG. 8, the analysis unit 140 can analyze the association between the vehicle type (railroad type) and the route as the association between types of railway accident data. In addition, the analysis unit 140 can analyze the association between the vehicle type and the accident / disorder type through the two-dimensional matrix based on the railroad classification item and the accident / disorder classification item. As described above, the analysis unit 140 can analyze the association between two different types, rather than the association between individual items of the accident data.

일반적으로 2차원 행렬에 SVD를 적용하는 경우, 행 관점에서의 유사도 또는 열 관점에서의 유사도가 각각 산출될 수 있다. 그러나, 본원은 개별 항목간의 연관성이 아닌 두 유형 간의 연관성을 분석하는 것이므로, 분석부(140)는 두 유형간의 유사도 값을 산출할 수도 있고, 또는 분석부(140)는 두 유형 간의 전체적인 유사도 수준이 특정 수준(예를 들어, 70%) 이상인지의 여부만 판단할 수 있다.In general, when SVD is applied to a two-dimensional matrix, similarity in terms of rows or similarity in terms of columns can be calculated. However, since the present invention analyzes an association between two types of items rather than an association between individual items, the analysis unit 140 may calculate the similarity value between the two types, or the analysis unit 140 may calculate the similarity level between the two types It can be judged whether or not it is above a certain level (for example, 70%).

제어부(150)는 분석부(140)에서 분석된 결과가 디스플레이 화면에 표시되도록 제어할 수 있다.The control unit 150 may control the analysis unit 140 to display the analyzed result on the display screen.

제어부(150)는 분석부(140)를 통해 산출된 두 유형간의 유사도 값이 디스플레이 화면에 표시되도록 할 수 있다. 또한, 제어부(150)는, 분석부(140)에서 두 유형 간의 유사도가 특정 수준 이상인지의 여부를 판단한 경우, 그 결과에 기초하여 연관성 판정 대상 유형 간에 연관성이 존재하는지 여부(있음/없음)가 디스플레이 화면에 표시되도록 할 수 있다.The control unit 150 may cause the similarity value between the two types calculated through the analysis unit 140 to be displayed on the display screen. When the analysis unit 140 determines whether the degree of similarity between two types is greater than or equal to a certain level, the control unit 150 determines whether or not there is a correlation between the types of relevancy determination (Yes / No) based on the result It can be displayed on the display screen.

연관성 분석 결과의 화면 표시 예는 도 9를 참조하여 보다 쉽게 이해될 수 있다.An example of screen display of the results of the association analysis can be more easily understood with reference to Fig.

도 9는 본원의 제2 측면에 따른 철도사고 데이터 유형간 연관성 분석 장치에서 철도사고 데이터의 유형 간의 연관성 분석 결과를 사용자가 이해하기 쉬운 형태로 디스플레이 한 예를 나타낸 도면이다.FIG. 9 is a diagram showing an example in which the association analysis result between types of railway accident data is displayed in an easy-to-understand manner by a user in an apparatus for analyzing associations between railway accident data types according to a second aspect of the present invention.

도 9를 참조하면, 디스플레이 화면에는 연관성 분석 결과로서 대상 유형 정보와 대항 유형 간의 연관성 유무 및 연관성 정도가 표시될 수 있다. 보다 자세하게는, 일예로, 디스플레이 화면에는 대상 유형 정보로서 '차량유형-노선' 정보가 표시될 수 있으며, 또한 철도사고 데이터 내에서 차량유형과 노선의 두 유형 간의 연관성은 '있음(유)'이고, 연관성 정도는 85%로 표시될 수 있다.Referring to FIG. 9, on the display screen, the degree of association and the degree of association between the object type information and the opposite type can be displayed as a result of the association analysis. More specifically, for example, 'vehicle type-route' information may be displayed on the display screen as object type information, and the correlation between the two types of vehicle type and route in the railway accident data is 'yes' , And the degree of association may be expressed as 85%.

본원의 제2 측면에 따른 실시예에서는 두 유형 간의 연관성 분석이 철도사고 데이터에만 기초하여 이루어지는 것으로 예시하였으나, 이에 한정된 것은 아니며, 다양한 분야의 데이터에도 적용될 수 있다.In the embodiment according to the second aspect of the present invention, the association analysis between the two types is performed based only on the railway accident data. However, the present invention is not limited thereto and can be applied to various fields of data.

이러한 본원의 제2 측면에 따른 연관성 분석 장치(100)는 철도사고 데이터의 유형 간에 연관성을 SVD를 통해 쉽게 산출할 수 있다. The association analyzing apparatus 100 according to the second aspect of the present invention can easily calculate the association between types of railway accident data through SVD.

또한, 본원의 제2 측면에 따른 연관성 분석 장치(100)는 철도사고 데이터를 재정의하고, 철도사고 데이터 내의 데이터 구성 항목들 중 일부 항목을 이용하여 2차원 행렬을 생성하고, 생성된 2차원 행렬의 각 항목에 특이값 분해법(SVD)을 적용하여 2차원 행렬에 대응하는 데이터 유형 간의 연관성을 분석하고, 분석 결과를 디스플레이 화면에 표시함으로써, 철도사고 데이터의 개별 항목 간의 연관성이 아닌 유형간의 연관성 여부를 분석할 수 있다.In addition, the association analyzing apparatus 100 according to the second aspect of the present invention redefines the railway accident data, generates a two-dimensional matrix using some items of the data configuration items in the railway accident data, The SVD is applied to each item to analyze the association between the data types corresponding to the two-dimensional matrix and the analysis result is displayed on the display screen. Can be analyzed.

또한, 본원의 제2 측면에 따른 연관성 분석 장치(100)는 철도사고 데이터의 2차원 행렬 구성 방법과 그 행렬에 SVD 해법이 적용된 결과를 표시함으로써, 철도사고 데이터 유형 간에 연관성 분석 결과를 사용자가 직관적으로 이해하기 쉬운 형태로 제공할 수 있다.In addition, the association analyzing apparatus 100 according to the second aspect of the present invention displays a method of constructing a two-dimensional matrix of railway accident data and a result of applying the SVD solution to the matrix, thereby allowing the user to intuitively Can be provided in an easy-to-understand form.

도 10은 본원의 제3 측면에 따른 열차사고 위험도 예측 방법에 대한 동작 흐름도이다.10 is a flowchart of a method for predicting the risk of a train accident according to the third aspect of the present application.

도 10에 도시된 열차사고 위험도 예측 방법은 앞서 설명된 열차사고 위험도 예측 장치(10)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 열차사고 위험도 예측 장치(10)에 대하여 설명된 내용은 열차사고 위험도 예측 방법에 대한 설명에도 동일하게 적용될 수 있다.The train accident risk prediction method shown in FIG. 10 can be performed by the train accident risk prediction apparatus 10 described above. Therefore, even if omitted from the following description, the description of the train accident risk prediction device 10 can be equally applied to the description of the train accident risk prediction method.

도 10을 참조하면, 본원의 제3 측면에 따른 열차사고 위험도 예측 방법은 열차사고 위험도 예측의 대상이 되는 신규 데이터를 수신할 수 있다(S11).Referring to FIG. 10, the method for predicting a train accident risk according to the third aspect of the present invention can receive new data to be subjected to a train accident risk prediction (S11).

다음으로, 단계S12에서는, 열차사고 데이터 및 운행실적 데이터를 포함하는 데이터셋에 기초하여 신규 데이터에 대한 열차사고 위험도 예측을 위한 위험예측 모델을 선정할 수 있다.Next, in step S12, a risk prediction model for predicting a train accident risk for new data can be selected based on a data set including train accident data and driving performance data.

여기서, 데이터셋은 복수의 레코드를 포함하고, 레코드는 열차사고 데이터 및 운행실적 데이터를 운행일자, 편성번호 및 열차번호를 기준으로 연결함으로써 생성될 수 있다.Here, the data set includes a plurality of records, and the records can be generated by connecting the train accident data and the running performance data on the basis of the operation date, the combination number, and the train number.

다음으로, 단계S13에서는, 단계S12에서 선정된 위험예측 모델에 기초하여 신규 데이터에 대한 열차사고 위험도(위험 수준)를 예측할 수 있다.Next, in step S13, the risk of train accident (risk level) for the new data can be predicted based on the risk prediction model selected in step S12.

이때, 단계S13에서는 신규 데이터에 대한 열차사고 위험도를 정상, 주의, 경고 및 위험 중 어느 하나로 예측할 수 있다.At this time, in step S13, the risk of a train accident with respect to new data can be predicted as one of normal, caution, warning, and danger.

또한, 도면에 도시하지는 않았으나, 본원의 제3 측면에 따른 열차사고 위험도 예측 방법은 데이터셋에 저장된 데이터를 정규화하는 전처리를 수행하는 단계를 포함할 수 있다.Also, although not shown in the drawings, the method for predicting the risk of a railway accident according to the third aspect of the present invention may include a step of performing a pre-processing for normalizing data stored in a data set.

이때, 전처리를 수행하는 단계는, 정규화된 데이터를 기반으로 하여 데이터셋 내의 레코드별 심각도를 산정하는 단계 및 정규화된 상기 데이터를 기반으로 하여 위험예측 모델의 생성을 위한 적어도 하나의 변수를 선정하는 단계를 포함할 수 있다.At this time, the pre-processing step may include calculating the severity of each record in the data set based on the normalized data, and selecting at least one parameter for generating a risk prediction model based on the normalized data . &Lt; / RTI >

여기서, 심각도를 산정하는 단계에서는 사고 피해 규모 및 사고 빈도수를 고려하여 심각도를 산정할 수 있다. 또한, 심각도를 산정하는 단계에서는 지연시간을 더 고려하여 심각도를 산정할 수 있다.Here, at the stage of calculating the severity, the severity can be calculated considering the scale of the accident and the frequency of the accident. In the step of calculating the severity, the severity can be calculated considering the delay time.

또한, 본원의 제3 측면에 따른 열차사고 위험도 예측 방법은, 정규화된 상기 데이터에 기초하여 위험예측 모델을 생성하는 단계를 포함할 수 있다. 이때, 위험예측 모델을 생성하는 단계에서는, 심각도의 산정에 의한 위험수준 값 및 선정된 변수를 이용하여 위험예측 모델을 생성할 수 있다.In addition, the train accident risk prediction method according to the third aspect of the present invention may include generating a risk prediction model based on the normalized data. At this time, in the step of generating the risk prediction model, the risk prediction model can be generated by using the risk level value and the selected variable by the calculation of the severity.

또한, 본원의 제3 측면에 따른 열차사고 위험도 예측 방법은, 생성된 위험예측 모델에 복수의 머신러닝 알고리즘을 적용함으로써 위험예측 모델에 대한 학습을 수행하는 단계를 포함할 수 있다.Also, the train accident risk prediction method according to the third aspect of the present invention may include performing learning on the risk prediction model by applying a plurality of machine learning algorithms to the generated risk prediction model.

이때, 위험예측 모델을 선정하는 단계에서는, 학습 결과에 기초하여 복수의 머신러닝 알고리즘 중 가장 높은 정확도를 나타내는 머신러닝 알고리즘이 적용된 위험예측 모델을 신규 데이터에 대한 열차사고 위험도 예측을 위한 위험예측 모델로서 선정할 수 있다.At this time, in the step of selecting a risk prediction model, a risk prediction model to which a machine learning algorithm showing the highest accuracy among a plurality of machine learning algorithms is applied is used as a risk prediction model for predicting a train accident risk Can be selected.

여기서, 복수의 머신러닝 알고리즘은 랜덤포레스트(random forest) 알고리즘, 서포트 벡터 머신(Support Vector Machine, SVM) 알고리즘 및 k-최근접이웃(k-Nearest Neighbors, KNN) 알고리즘 중 적어도 하나를 포함할 수 있다.Here, the plurality of machine learning algorithms may include at least one of a random forest algorithm, a support vector machine (SVM) algorithm, and a k-nearest neighbors (KNN) algorithm .

상술한 설명에서, 단계 S11 내지 S13은 본원의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S11 to S13 may be further divided into further steps or combined into fewer steps, according to embodiments of the present application. Also, some of the steps may be omitted as necessary, and the order between the steps may be changed.

도 11은 본원의 제4 측면에 따른 철도사고 데이터 유형간 연관성 분석 방법에 대한 동작 흐름도이다.11 is a flowchart illustrating a method of analyzing a correlation between types of railway accident data according to a fourth aspect of the present invention.

도 11에 도시된 철도사고 데이터 유형간 연관성 분석 방법은 앞서 설명된 철도사고 데이터 유형간 연관성 분석 장치(100)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 철도사고 데이터 유형간 연관성 분석 장치(100)에 대하여 설명된 내용은 철도사고 데이터 유형간 연관성 분석 방법에 대한 설명에도 동일하게 적용될 수 있다.The correlation analysis method between railway accident data types shown in FIG. 11 can be performed by the correlation analysis apparatus 100 between railway accident data types described above. Therefore, even if omitted below, the contents described for the correlation analyzing apparatus 100 between the railway accident data types can be similarly applied to the explanation of the correlation analysis method between railway accident data types.

도 11을 참조하면, 단계S21에서는 획득된 철도사고 데이터를 재정의할 수 있다.Referring to FIG. 11, the acquired railway accident data can be redefined in step S21.

다음으로, 단계S22에서는 철도사고 데이터 내의 데이터 구성 항목들 중 일부 항목을 이용하여 2차원 행렬을 생성할 수 있다.Next, in step S22, a two-dimensional matrix can be generated using some items of the data configuration items in the railway accident data.

이때, 단계S22에서는 엑셀의 피벗 테이블 또는 다차원 분석의 피벗 그리드를 이용하여 2차원 행렬을 생성할 수 있다.At this time, in step S22, a two-dimensional matrix may be generated using a pivot table of Excel or a pivot grid of multi-dimensional analysis.

다음으로, 단계S23에서는 단계S22에서 생성된 2차원 행렬의 각 항목에 특이값 분해법(Single Value Decomposition, SVD)을 적용함으로써, 2차원 행렬에 대응하는 데이터 유형 간의 연관성을 분석할 수 있다.Next, in step S23, the association between the data types corresponding to the two-dimensional matrix can be analyzed by applying the single value decomposition method (SVD) to each item of the two-dimensional matrix generated in step S22.

이때, 단계S23에서는 데이터 유형 간의 연관성이 기 설정된 유사도 값 이상인지 판단할 수 있다.At this time, in step S23, it can be determined whether or not the association between data types is equal to or greater than a predetermined similarity value.

다음으로, 단계S24에서는 단계S23에서의 분석 결과를 디스플레이 화면에 표시할 수 있다.Next, in step S24, the analysis result in step S23 can be displayed on the display screen.

상술한 설명에서, 단계 S21 내지 S24는 본원의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S21 to S24 may be further divided into additional steps, or combined in fewer steps, according to embodiments of the present disclosure. Also, some of the steps may be omitted as necessary, and the order between the steps may be changed.

본원의 일 실시 예에 따른 열차사고 위험도 예측 방법 및 철도사고 데이터 유형간 연관성 분석 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method for predicting a train accident risk according to an embodiment of the present invention and the method for analyzing a correlation between railway accident data types can be implemented in a form of a program command that can be executed through various computer means and recorded in a computer readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

전술한 본원의 설명은 예시를 위한 것이며, 본원이 속하는 기술분야의 통상의 지식을 가진 자는 본원의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.It will be understood by those of ordinary skill in the art that the foregoing description of the embodiments is for illustrative purposes and that those skilled in the art can easily modify the invention without departing from the spirit or essential characteristics thereof. It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive. For example, each component described as a single entity may be distributed and implemented, and components described as being distributed may also be implemented in a combined form.

본원의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본원의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is defined by the appended claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included within the scope of the present invention.

10: 열차사고 위험도 예측 장치
11: 수신부
12: 위험예측 모델 선정부
13: 예측부10: Train accident risk prediction device
11: Receiver
12: Risk prediction model selection
13:

Claims

A method for predicting a train accident risk by a train accident risk prediction device,
Receiving, in the train accident risk prediction device, new data to be a target of train accident risk prediction;
Selecting a risk prediction model for predicting a train accident risk for the new data based on a data set including train accident data and driving performance data in the train accident risk prediction device;
Wherein the train accident risk prediction device includes a plurality of machine learning algorithms for applying a plurality of machine learning algorithms to a risk prediction model generated based on data stored in a normalized data set for selecting the risk prediction model, Generating a plurality of risk prediction models and performing learning for each of the plurality of generated risk prediction models; And
And predicting the risk of a train accident for the new data to one of normal, caution, warning, and risk based on the selected risk prediction model in the train accident risk prediction device,
The data set is generated to include a plurality of records, and the record includes train accident data, information related to past train accidents and train disturbances, and train performance data, which is information related to the number of train runs and train travel distance, Number, and train number and is generated as one record,
Wherein the step of selecting the risk prediction model comprises:
Calculating, based on the learning results, a risk prediction model to which a machine learning algorithm indicating the highest accuracy among the plurality of risk prediction models is applied, in consideration of a plurality of prediction accuracy of each risk prediction model corresponding to each of the plurality of machine learning algorithms, A method for predicting the risk of a train accident, which is selected as a risk prediction model for predicting the risk of train accidents to data.

The method according to claim 1,
Performing a pre-processing for normalizing data stored in the data set in the train accident risk prediction device,
The method comprising the steps of:

3. The method of claim 2,
The step of performing the pre-
Estimating a severity of each record in the data set based on the normalized data; And
Selecting at least one variable for generation of a risk prediction model based on the normalized data,
Wherein the risk prediction model generated based on the data stored in the normalized data set is generated using the risk level value by the calculation of the severity and the selected variable.

The method of claim 3,
Wherein the step of estimating the severity is to calculate the severity in consideration of the magnitude of the accident damage and the frequency of accidents.

5. The method of claim 4,
Wherein calculating the severity further comprises calculating the severity by further considering the delay time.

delete

The method according to claim 1,
The plurality of machine learning algorithms
The method comprising at least one of a random forest algorithm, a support vector machine (SVM) algorithm, and a k-nearest neighbors (KNN) algorithm.

delete

A train accident risk prediction apparatus comprising:
A receiver for receiving new data to be a target of a train accident risk prediction;
A risk prediction model for selecting a risk prediction model for predicting a train accident risk for the new data based on a data set including train accident data and running performance data;
Generating a plurality of risk prediction models corresponding to each of the plurality of machine learning algorithms by applying a plurality of machine learning algorithms to the risk prediction model generated based on the data stored in the normalized data set in order to select the risk prediction model A learning unit for performing learning for each of the generated plurality of risk prediction models; And
And a predictor for predicting the risk of a train accident for the new data to one of normal, caution, warning, and risk based on the selected risk prediction model,
The data set is generated to include a plurality of records, and the record includes train accident data, information related to past train accidents and train disturbances, and train performance data, which is information related to the number of train runs and train travel distance, Number, and train number and is generated as one record,
Wherein the risk prediction model selection unit comprises:
Calculating, based on the learning results, a risk prediction model to which a machine learning algorithm indicating the highest accuracy among the plurality of risk prediction models is applied, in consideration of a plurality of prediction accuracy of each risk prediction model corresponding to each of the plurality of machine learning algorithms, A system for predicting a train accident risk, the system being selected as a risk prediction model for predicting a train accident risk for data.

A computer-readable recording medium storing a program for executing the method of any one of claims 1 to 5 and a computer.