KR102042645B1

KR102042645B1 - Apparatus and method for predicting line risk using compound model

Info

Publication number: KR102042645B1
Application number: KR1020180167845A
Authority: KR
Inventors: 김상수; 정지수; 이현경; 정영기
Original assignee: (주)위세아이텍
Priority date: 2018-12-21
Filing date: 2018-12-21
Publication date: 2019-11-27

Abstract

The present invention relates to a line risk prediction apparatus and method using a complex model. According to the present invention, the line risk prediction apparatus may comprise the steps of: collecting new data which is subject to line risk prediction; performing preprocessing of the new data; generating a binomial classification model for predicting a line risk for the new data based on a dataset including at least one of accident failure data, track defects, maintenance data, weather data, and facility data; generating a polynomial classification model for predicting the line risk for the new data based on the dataset; and predicting the line risk by combining the binomial classification model and the polynomial classification model.

Description

Apparatus and method for predicting track risk using complex model {APPARATUS AND METHOD FOR PREDICTING LINE RISK USING COMPOUND MODEL}

본원은 복합모형을 이용한 선로 위험예측 장치 및 방법에 관한 것으로, 특히 빅데이터 기반 머신러닝 알고리즘을 이용하여 선로 위험을 예측하기 위한 모형개발 및 방법에 관한 것이다.The present invention relates to an apparatus and method for predicting track risk using a complex model, and more particularly, to a model development and method for predicting track risk using a big data-based machine learning algorithm.

철도사고는 한번의 사고로 큰 인명피해와 막대한 손실을 야기하는 중대한 사고 중 하나라 할 수 있다. 그런데, 종래에는 이러한 철도사고의 위험 분석 기술과 관련하여 그 개발 수준이 마땅치 않은 실정이다.Railroad accidents are one of the most serious accidents that cause great casualties and huge losses in one accident. However, in the related art, the level of development of the risk analysis technology of the railway accident is not appropriate.

일예로, 종래에는 통계적 기법에 기초하여 사고의 위험을 분석하는 기술이 공지된 바 있다. 그런데 통계적 기법 기반의 사고 위험 분석 기술을 통해서는 선로의 위험을 정확히 예측하는 데에 한계가 있다.For example, there is a conventional technique for analyzing a risk of an accident based on statistical techniques. However, statistical techniques-based accident risk analysis techniques have limitations in accurately predicting line hazards.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 선로의 위험을 보다 정확히 예측함으로써 선로의 위험을 사전에 효과적으로 예방할 수 있는 선로 위험예측모형 개발방법을 제공하려는 것을 목적으로 한다.The present invention is to solve the problems of the prior art, it is an object of the present invention to provide a method for developing a track risk prediction model that can effectively prevent the risk of the track in advance by more accurately predicting the risk of the track.

다만, 본원의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problem to be achieved by the embodiments of the present application is not limited to the technical problems as described above, and other technical problems may exist.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 일 실시예에 따른 선로 위험도 예측 방법은, 선로 위험도 예측의 대상이 되는 신규 데이터를 수집하는 단계; 상기 신규 데이터의 전처리를 수행하는 단계; 사고장애 데이터, 선로결함, 유지보수 데이터, 기상 데이터, 시설물 데이터 중 적어도 하나를 포함하는 데이터셋에 기초하여 상기 신규 데이터에 대한 선로 위험도 예측을 위한 이항분류 모형을 생성하는 단계; 상기 데이터셋에 기초하여 신규 데이터에 대한 선로 위험도 예측을 위한 다항분류 모형을 생성하는 단계; 상기 이항분류 모형과 상기 다항분류 모형을 결합하여 선로 위험도를 예측하는 단계를 포함할 수 있다.As a technical means for achieving the above technical problem, the line risk prediction method according to an embodiment of the present application, collecting the new data that is the target of the line risk prediction; Performing preprocessing of the new data; Generating a binomial classification model for predicting track risk for the new data based on a dataset including at least one of accident failure data, track defects, maintenance data, weather data, and facility data; Generating a polynomial classification model for predicting track risk for new data based on the data set; Combining the binomial classification model and the polynomial classification model may include estimating a line risk.

본원의 일 실시예에 따르면, 상기 신규 데이터는 사고장애데이터, 선로결함, 유지보수데이터, 기상데이터, 및 시설물 데이터 중 적어도 하나와 관련된 선로 관련 데이터일 수 있다.According to one embodiment of the present application, the new data may be track-related data associated with at least one of accident failure data, track defects, maintenance data, weather data, and facility data.

본원의 일 실시예에 따르면, 상기 데이터셋은 복수의 레코드를 포함하고, 상기 레코드는 사고장애데이터, 선로결함, 유지보수 데이터, 기상데이터, 시설물 데이터를 일자, 키로정 및 상하구분 코드를 기준으로 연결하는 것일 수 있다.According to the exemplary embodiment of the present application, the data set includes a plurality of records, and the records include accident failure data, line defects, maintenance data, weather data, and facility data based on dates, keystrokes, and up / down codes. It may be to connect.

본원의 일 실시예에 따르면, 상기 신규 데이터의 전처리를 수행하는 단계는, 상기 데이터 수집부가 수집한 데이터를 탐색하여 분석할 데이터를 획득하고, 획득한 데이터로부터 분석을 수행할 분석 변수를 선택하며, 선택한 변수에 대응되는 데이터의 전처리를 수행하는 것일 수 있다.According to one embodiment of the present application, the step of performing the pre-processing of the new data, the data collector to search the collected data to obtain the data to be analyzed, select the analysis variable to perform the analysis from the obtained data, It may be to perform preprocessing of data corresponding to the selected variable.

본원의 일 실시예에 따르면, 상기 신규 데이터의 전처리를 수행하는 단계는, Robust 방법을 이용하여 중앙값과 IQR(interquartile range)을 사용하여 이상치의 영향을 최소화하며 데이터의 변수값을 정규화하는 것일 수 있다.According to one embodiment of the present application, the step of performing the pre-processing of the new data may be to normalize the variable values of the data while minimizing the influence of the outliers using the median value and the interquartile range (IQR) using the robust method. .

본원의 일 실시예에 따르면, 이항분류 모형을 생성하는 단계는, 이항 위험수준값을 종속변수로 설정하고 선정된 변수를 독립변수로 설정하여 위험예측 모델을 생성하는 단계; 복수의 머신러닝 알고리즘을 적용하여, 생성된 상기 위험예측 모델을 학습하는 단계; 상기 복수의 머신러닝 알고리즘에 의해 선로 위험도 예측값을 산출하는 단계를 포함 수 있다.According to an embodiment of the present disclosure, generating the binomial classification model may include: generating a risk prediction model by setting a binomial risk level value as a dependent variable and setting the selected variable as an independent variable; Learning a generated risk prediction model by applying a plurality of machine learning algorithms; Computing the line risk prediction value by the plurality of machine learning algorithms.

본원의 일 실시예에 따르면, 상기 복수의 머신러닝 알고리즘은 Random Forest, Logistic Regression, XGBoost Classifier, Balanced Bagging Classifier, 앙상블(Ensemble) 중 적어도 하나를 포함할 수 있다.According to one embodiment of the present application, the plurality of machine learning algorithms may include at least one of Random Forest, Logistic Regression, XGBoost Classifier, Balanced Bagging Classifier, Ensemble.

본원의 일 실시예에 따르면, 다항분류 모형을 생성하는 단계는, 다항 위험수준값을 종속변수로 설정하고 선정된 변수를 독립변수로 설정하여 위험예측 모델을 생성하는 단계; 복수의 머신러닝 알고리즘을 적용하여, 생성된 상기 위험예측 모델을 학습하는 단계; 상기 복수의 머신러닝 알고리즘에 의해 선로 위험도 예측값을 산출하는 단계를 포함할 수 있다.According to an embodiment of the present disclosure, the generating of the polynomial classification model may include generating a risk prediction model by setting the polynomial risk level value as a dependent variable and setting the selected variable as an independent variable; Learning a generated risk prediction model by applying a plurality of machine learning algorithms; Computing the line risk prediction value by the plurality of machine learning algorithms.

본원의 일 실시예에 따르면, 상기 복수의 머신러닝 알고리즘은 K-Nearest Neighbor(knn), Support Vector Machine(svm), XGBoost Classifier, Balanced Bagging Classifier, Adaboost 중 적어도 하나를 포함할 수 있다.According to one embodiment of the present application, the plurality of machine learning algorithms may include at least one of K-Nearest Neighbor (knn), Support Vector Machine (svm), XGBoost Classifier, Balanced Bagging Classifier, Adaboost.

본원의 일 실시예에 따르면, 상기 이항분류 모형과 다항분류 모형을 결합하여 선로 위험도를 예측하는 단계는, 이항분류 모형과 다항분류 모형을 더한 복합모형에서 정확도와 패널티를 고려하여 선정한 최적의 알고리즘에서 도출된 결과값을 선로 위험도 예측 결과로서 제공하는 것일 수 있다.According to one embodiment of the present application, the step of predicting the line risk by combining the binomial classification model and polynomial classification model, in the optimal algorithm selected in consideration of the accuracy and penalty in the complex model plus the binomial classification model and polynomial classification model The derived result may be provided as a track risk prediction result.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 일 실시예에 따른 선로 위험도 예측 장치는, 선로 위험도 예측의 대상이 되는 신규 데이터를 수집하는 데이터 수집부, 상기 데이터 수집부가 수집한 데이터의 전처리를 수행하는 데이터 전처리부, 사고장애 데이터, 선로결함, 유지보수 데이터, 기상 데이터, 시설물 데이터 중 적어도 하나를 포함하는 데이터셋에 기초하여 신규 데이터에 대한 선로 위험도 예측을 위한 이항분류 모형을 생성하는 이항분류 모형부, 상기 데이터셋에 기초하여 신규 데이터에 대한 선로 위험도 예측을 위한 다항분류 모형을 생성하는 다항분류 모형부, 상기 이항분류 모형과 다항분류 모형을 결합하여 선로 위험도를 예측하는 위험예측부를 포함할 수 있다.As a technical means for achieving the above technical problem, a line risk predicting apparatus according to an embodiment of the present application, the data collection unit for collecting new data that is the target of the line risk prediction, the pre-processing of the data collected by the data collection unit Binomial to generate a binomial classification model for predicting track risk for new data based on a data set including at least one of a data preprocessor, accident failure data, line fault, maintenance data, weather data, and facility data. A classification model unit, a polynomial classification model unit for generating a polynomial classification model for predicting a track risk for new data based on the data set, and a risk prediction unit for predicting a line risk by combining the binomial classification model and the polynomial classification model. can do.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본원을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 추가적인 실시예가 존재할 수 있다.The above-mentioned means for solving the problems are merely exemplary and should not be construed as limiting the present application. In addition to the above-described exemplary embodiments, additional embodiments may exist in the drawings and detailed description of the invention.

전술한 본원의 과제 해결 수단에 의하면, 본 발명의 내용은 개발된 선로 위험예측 모형을 통해 사고 위험을 미리 예측하여 선제적 예방효과를 기대한다. 또한 해당 핵심기술 개발성과를 다른 산업분야에 적용, 분석 및 활용을 통해 가치를 창출시킬 수 있다.According to the above-described problem solving means of the present application, the contents of the present invention anticipate the risk of accident in advance through the developed line risk prediction model to expect a proactive prevention effect. In addition, it is possible to create value by applying, analyzing and utilizing the core technology development results in other industries.

도 1은 본원의 일 실시예에 따른 선로 위험도 예측 장치의 개략적인 구성을 나타낸 도면이다.
도 2는 본원의 다양한 실시예에 따른 선로 위험도 예측 장치에서 중요변수를 선정한 데이터를 나타내는 도면이다.
도 3은 본원의 다양한 실시예에 따른 선로 위험도 예측 장치에서 클러스터별로 계산한 이항분류 예측 정확도를 정리한 예시도이다.
도 4는 본원의 다양한 실시예에 따른 선로 위험도 예측 장치에서 클러스터별로 계산한 다항분류 예측 정확도를 정리한 예시도이다.
도 5는 본원의 다양한 실시예에 따른 선로 위험도 예측 방법에 대한 동작 흐름도이다.1 is a diagram illustrating a schematic configuration of an apparatus for predicting a track risk according to an exemplary embodiment of the present application.
2 is a diagram illustrating data for selecting important variables in a track risk predicting apparatus according to various embodiments of the present disclosure.
3 is an exemplary view summarizing the binomial classification prediction accuracy calculated for each cluster in the line risk prediction apparatus according to various embodiments of the present disclosure.
4 is an exemplary view summarizing the accuracy of polynomial classification prediction calculated for each cluster in the track risk prediction apparatus according to various embodiments of the present disclosure.
5 is a flowchart illustrating a method for predicting a track risk according to various embodiments of the present disclosure.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present disclosure. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted for simplicity of explanation, and like reference numerals designate like parts throughout the specification.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결" 또는 "간접적으로 연결"되어 있는 경우도 포함한다. Throughout this specification, when a part is "connected" to another part, it is not only "directly connected" but also "electrically connected" or "indirectly connected" with another element in between. "Includes the case.

본원 명세서 전체에서, 어떤 부재가 다른 부재 "상에", "상부에", "상단에", "하에", "하부에", "하단에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout this specification, when a member is said to be located on another member "on", "upper", "top", "bottom", "bottom", "bottom", this means that any member This includes not only the contact but also the presence of another member between the two members.

본원 명세서 전체에서, 어떤 부분이 어떤 구성 요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Throughout this specification, when a part is said to "include" a certain component, it means that it can further include other components, without excluding the other components unless specifically stated otherwise.

이하에서는 도 1 내지 도 5를 참조하여 본원의 선로 위험도 예측 장치 및 방법에 대하여 설명한다.Hereinafter, an apparatus and method for predicting track risk of the present application will be described with reference to FIGS. 1 to 5.

도 1은 본원의 일 실시예에 따른 선로 위험도 예측 장치의 개략적인 구성을 나타낸 도면이다.1 is a diagram illustrating a schematic configuration of an apparatus for predicting a track risk according to an exemplary embodiment of the present application.

도 1을 참조하면, 선로 위험도 예측 장치(10)는 데이터 수집부(11), 데이터 전처리부(12), 이항분류 모형부(13), 다항분류 모형부(14), 위험예측부(15)를 포함할 수 있다Referring to FIG. 1, the line risk predicting apparatus 10 includes a data collecting unit 11, a data preprocessor 12, a binomial classification model unit 13, a polynomial classification model unit 14, and a risk prediction unit 15. May include

데이터 수집부(11)는 선로 위험도 예측의 대상이 되는 신규 데이터를 수집할 수 있다. 이 때, 신규 데이터는 사고장애데이터, 선로결함, 유지보수데이터, 기상데이터, 및 시설물 데이터 중 적어도 하나와 관련된 선로 관련 데이터일 수 있다.The data collection unit 11 may collect new data that is a target of track risk prediction. In this case, the new data may be track-related data related to at least one of accident failure data, track defects, maintenance data, weather data, and facility data.

본원의 일 실시예에 따르면, 데이터 수집부(11)는 네트워크를 통해 외부 서버에서 열차 관련 데이터를 수집할 수 있다. 데이터 수집부(11) 및 외부 서버간의 정보 공유를 위한 네트워크의 일 예로는 3GPP(3rd Generation Partnership Project) 네트워크, LTE(Long Term Evolution) 네트워크, 5G 네트워크, WIMAX(World Interoperability for Microwave Access) 네트워크, 유무선 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local AreaNetwork), WAN(Wide Area Network), PAN(Personal Area Network), 블루투스(Bluetooth) 네트워크, Wifi 네트워크, NFC(Near Field Communication) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함될 수 있으며, 이에 한정된 것은 아니다.According to one embodiment of the present application, the data collector 11 may collect train-related data from an external server through a network. Examples of networks for sharing information between the data collector 11 and external servers include 3rd Generation Partnership Project (3GPP) networks, Long Term Evolution (LTE) networks, 5G networks, World Interoperability for Microwave Access (WIMAX) networks, and wired and wireless networks. Internet, Local Area Network (LAN), Wireless Local Area Network (LAN), Wide Area Network (WAN), Personal Area Network (PAN), Bluetooth (Bluetooth) Network, Wifi Network, Near Field Communication (NFC) Network , Satellite broadcasting network, analog broadcasting network, digital multimedia broadcasting (DMB) network, and the like, but are not limited thereto.

이 때, 본원의 본원의 일 실시예에 따른 선로 위험도 예측 장치는 데이터셋을 생성하는 데이터셋 생성부를 포함할 수 있다. 데이터셋에는 사고장애데이터와 선로결함, 유지보수 데이터, 기상데이터, 및 시설물 데이터가 통합되어 저장될 수 있다. 이에 따르면, 신규 데이터는 데이터셋에 포함되어 저장될 수 있다. At this time, the line risk prediction apparatus according to an embodiment of the present application of the present application may include a data set generating unit for generating a data set. Datasets can integrate and store accident failure data, track defects, maintenance data, weather data, and facility data. According to this, the new data may be included in the data set and stored.

여기서, 사고장애 데이터는 선로와 연관된 과거의 열차 사고 및 열차 장애와 관련된 데이터로서, 선로사고데이터, 철로사고데이터, 열차사고데이터, 철도사고 데이터 등으로 달리 지칭될 수 있다. 유지보수 데이터는 선로의 결함을 보수한 데이터 및 결함을 방지하기 위해 보수한 데이터 등의 정보를 포함할 수 있다. 기상데이터는 선로의 결함, 뒤틀림, 및 변형을 유발할 수 있는 날씨, 온도, 일교차, 풍속, 강수량 등의 정보를 포함할 수 있다. 시설물 데이터는 터널, 교량 등의 정보를 포함할 수 있다. 그러나, 데이터셋에 포함되는 구성요소는 상기 기재한 것에 한정되는 것은 아니며, 각 데이터와 관련된 정보가 더 포함될 수 있다.Here, the accident failure data may be referred to as track accident data, railway accident data, train accident data, railway accident data, and the like as data related to past train accidents and train failures associated with the track. The maintenance data may include information such as data for repairing a defect in a line and data repaired to prevent a defect. The weather data may include information such as weather, temperature, day crossings, wind speed, precipitation, etc. which may cause track defects, distortions, and deformations. Facility data may include information such as tunnels, bridges, and the like. However, the components included in the data set are not limited to those described above, and may further include information related to each data.

또한, 데이터셋은 복수의 레코드를 구성할 수 있다. 여기서 레코드는 사고장애데이터와 선로결함, 유지보수 데이터, 기상데이터, 및 시설물 데이터를 일자, 키로정 및 상하구분 코드를 기준으로 연결함으로써 생성될 수 있다. 달리 말해, 본원에서는 선로 위험예측을 위해, 사고장애데이터와 선로결함, 유지보수 데이터, 기상데이터, 시설물데이터가 일자, 키로정 및 상하구분코드를 기준으로 연결되어 클러스터별로 데이터셋에 저장될 수 있다. 이때, 일자, 키로정 및 상하구분코드를 기준으로 연결되어 데이터셋에 저장되는 하나의 데이터를 하나의 레코드라 할 수 있다.In addition, a dataset may constitute a plurality of records. Here, the record may be generated by linking accident failure data with line defects, maintenance data, weather data, and facility data based on date, key, and up / down code. In other words, in the present application, in order to predict the risk of tracks, accident failure data, track defects, maintenance data, weather data, and facility data may be stored in a dataset for each cluster by connecting based on date, key path, and up / down code. . At this time, one data stored in the data set connected based on the date, key definition, and up / down division code may be referred to as one record.

데이터 전처리부(12)는 신규 데이터의 전처리를 수행할 수 있다. 또한, 데이터 전처리부(12)는 상기 데이터 수집부(11)가 수집한 데이터를 탐색하여 분석할 데이터를 획득하고, 획득한 데이터로부터 분석을 수행할 분석 변수를 선택하며, 선택한 변수에 대응되는 데이터의 전처리를 수행하는 것일 수 있다. 즉, 데이터 전처리부(12)는 데이터셋에 저장된 데이터의 변수 값들을 일정 기준으로 맞춰주는 데이터 정규화를 수행할 수 있다.The data preprocessor 12 may perform preprocessing of new data. In addition, the data preprocessor 12 searches for data collected by the data collector 11 to acquire data to be analyzed, selects an analysis variable to be analyzed from the acquired data, and corresponds to the selected variable. It may be to perform a pretreatment of. That is, the data preprocessor 12 may perform data normalization to match variable values of data stored in the data set on a predetermined basis.

또한, 데이터 전처리부(12)는 사고 피해규모 및 사고의 요인별 빈도수를 고려하여 심각도를 산정할 수 있다. 특히, 전처리부(12)는 사고 피해 규모 및 사고 요인별 빈도수의 곱함으로써 데이터셋 내의 레코드별로 심각도를 산정할 수 있다.In addition, the data preprocessing unit 12 may calculate the severity in consideration of the accident damage size and the frequency of each factor of the accident. In particular, the preprocessor 12 may calculate the severity for each record in the data set by multiplying the magnitude of the accident damage and the frequency of each accident factor.

다시 말해, 심각도는 사고 빈도수와 사고 피해 규모를 고려하여 산정될 수 있다. 이때, 전처리부(12)는 사고 빈도수와 각 사고의 피해 규모(사고 피해 규모)를 곱함으로써 데이터셋 내의 레코드별로 심각도를 산정할 수 있다. 여기서, 사고 피해 규모는 선로와 관련된 사고시 발생한 물적피해 규모에 의하여 산출될 수 있다. 또한 사고 피해 규모는 선로와 관련된 사고시 발생한 물적피해 규모에 영업피해 규모를 합산함으로써 산출될 수 있다. 즉, 사고 피해 규모는 선로와 관련된 사고시 발생한 물적피해 규모 외에 추가적으로 영업피해 규모를 고려함으로써 산출될 수 있다.In other words, the severity can be estimated by considering the frequency of accidents and the magnitude of the accident damage. In this case, the preprocessing unit 12 may calculate the severity for each record in the data set by multiplying the frequency of accidents by the damage magnitude (accident damage magnitude) of each accident. Here, the magnitude of the accident damage may be calculated by the magnitude of the physical damage caused during the accident associated with the track. In addition, accident damage can be calculated by adding the amount of operating damage to the amount of material damage incurred during the track-related accident. In other words, the accident damage size can be calculated by considering the amount of business damage in addition to the physical damage caused during the track-related accident.

또한, 전처리부(12)는 사고 피해 규모, 사고 빈도수 외에 지연시간을 더 고려하여 심각도를 산정할 수 있다. 여기서, 지연시간이라 함은 선로와 관련된 사고에 의한 열차 지연시간을 의미할 수 있다.In addition, the preprocessing unit 12 may calculate the severity in consideration of the delay time in addition to the magnitude of the accident damage, the frequency of the accident. Here, the delay time may mean a train delay time due to an accident related to the track.

데이터 전처리부(12)는 Robust 방법을 이용하여 중앙값과 IQR(interquartile range)을 사용하여 이상치의 영향을 최소화하며 데이터의 변수값을 정규화할 수 있다. 구체적으로, 데이터 전처리부(12)는 위험예측 모델의 안정성과 정확성을 향상시키고 데이터셋에 저장된 데이터가 갖는 오차를 효과적으로 줄이기 위해, 데이터 셋 내의 데이터에서 일정하지 않은 부분, 즉 일정하지 않은 변수 값을 갖는 데이터를 Robust 방법으로 정규화 할 수 있다. 데이터 전처리부(12)는 Robust 방법을 통한 정규화로 하여금, 중앙값(Median)과 IQR(interquartile range)을 사용하여 이상치의 영향을 최소화하며 데이터셋에 저장된 데이터의 변수값을 정규화할 수 있다.The data preprocessor 12 may minimize the influence of an outlier by using a median value and an interquartile range (IQR) using a robust method, and normalize variable values of data. In detail, the data preprocessor 12 may generate an irregular portion of the data in the data set, that is, a non-uniform variable value, to improve the stability and accuracy of the risk prediction model and effectively reduce the error of the data stored in the data set. Having data can be normalized using the robust method. The data preprocessor 12 may normalize using a robust method to minimize the influence of an outlier using a median and an interquartile range (IQR) and normalize variable values of data stored in the dataset.

또한, 전처리부(12)는 정규화된 데이터(즉, 데이터셋에 저장된 정규화된 데이터)를 기반으로 하여 선로위험예측 모형의 생성을 위한 적어도 하나의 변수를 선정할 수 있다. 달리 표현하여, 데이터 전처리부(12)는 정규화된 데이터를 기반으로 하여 선로 위험도 예측에 필요한 변수(중요 변수)를 선정할 수 있다. 이러한 변수의 선정은 선로 위험도 예측 모형의 생성시 선로 위험도 예측 모형에 영향력 있는 변수만을 사용하기 위해 이루어질 수 있다.In addition, the preprocessor 12 may select at least one variable for generating a line risk prediction model based on normalized data (ie, normalized data stored in a dataset). In other words, the data preprocessor 12 may select a variable (important variable) necessary for predicting a track risk based on normalized data. The selection of these variables can be made to use only influential variables in the track risk prediction model in the generation of the track risk prediction model.

본원의 일 실시예에 따르면, 전처리부(12)는 정규화된 데이터(즉, 데이터셋에 저장된 정규화된 데이터)를 기반으로 하여 위험예측 모델의 생성을 위한 적어도 하나의 변수를 선정할 수 있다. 달리 표현하여, 전처리부(12)는 정규화된 데이터를 기반으로 하여 선로 위험도 예측에 필요한 변수(중요변수)를 선정할 수 있다. 이러한 변수의 선정은 위험예측 모델(모형)의 생성시 위험예측 모델에 영향력 있는 변수만을 사용하기 위해 이루어질 수 있다.According to an embodiment of the present disclosure, the preprocessor 12 may select at least one variable for generation of a risk prediction model based on normalized data (ie, normalized data stored in a dataset). In other words, the preprocessor 12 may select a variable (important variable) necessary for predicting the track risk based on the normalized data. Selection of these variables can be made to use only those variables that are influential in the risk prediction model in the generation of the risk prediction model (model).

도 2는 본원의 다양한 실시예에 따른 선로 위험도 예측 장치에서 중요변수를 선정한 데이터를 나타내는 도면이다.2 is a diagram illustrating data for selecting important variables in a track risk predicting apparatus according to various embodiments of the present disclosure.

여기서 도 2를 참조하면, 데이터셋에 저장된 중요변수는 정규화된 데이터로서, 예를 들어, 최대기온, 일교차, 3일평균풍속, 최대풍속, 터널유무, 시간당 최대 강수량, 뒤틀림 구분, 계절, 교량유무, 일강수량, 33도_3일 지속여부, 면좌 중 적어도 하나의 변수를 포함하고, 변수 각각에 대한 중요도를 포함할 수 있다.Referring to FIG. 2, the important variables stored in the data set are normalized data, for example, maximum temperature, daily crossover, three-day average wind speed, maximum wind speed, tunnel presence, maximum precipitation per hour, distortion classification, season, and bridge existence. It may include at least one variable, the amount of precipitation, lasting 33 degrees _3 days, the face, and may include the importance for each variable.

이항분류 모형부(13)는 사고장애 데이터, 선로결함, 유지보수 데이터, 기상 데이터, 시설물 데이터 중 적어도 하나를 포함하는 데이터셋에 기초하여 상기 신규 데이터에 대한 선로 위험도 예측을 위한 이항분류 모형을 생성할 수 있다.The binomial classification model unit 13 generates a binomial classification model for predicting a track risk for the new data based on a data set including at least one of accident failure data, line defect, maintenance data, weather data, and facility data. can do.

본원의 일 실시예에 따르면, 이항분류 모형부(13)는 이항 위험수준값을 종속변수로 설정하고 선정된 변수를 독립변수로 설정하여 위험예측 모델을 생성하고, 복수의 머신러닝 알고리즘을 적용하여, 생성된 상기 위험예측 모델을 학습하고, 상기 복수의 머신러닝 알고리즘에 의해 선로 위험도 예측값을 산출할 수 있다. 본원에서 모델은 모형이라 달리 표현될 수 있다.According to an embodiment of the present application, the binomial classification model unit 13 sets a binomial risk level value as a dependent variable, sets the selected variable as an independent variable, generates a risk prediction model, and applies a plurality of machine learning algorithms. In addition, the generated risk prediction model may be learned, and a line risk prediction value may be calculated by the plurality of machine learning algorithms. The model herein can be represented otherwise as a model.

여기서 복수의 머신러닝 알고리즘은 Random Forest, Logistic Regression, XGBoost Classifier, Balanced Bagging Classifier, 앙상블(Ensemble) 중 적어도 하나를 포함할 수 있다. Random Forest 알고리즘은 여러 개의 의사결정 트리(Decision Tree)들이 Forest를 구성하여 각각의 예측결과를 하나의 결과변수로 평균화하는 알고리즘이고, Logistic Regression 알고리즘은 선형 예측에 사용되는 Linear 모델에 Sigmoid 함수를 적용하여 분류 문제를 해결하는 알고리즘이다. Xgboost 알고리즘은 Random Forest의 Tree는 독립적이라면, Xgboost의 Tree는 결과를 다음 트리에 적용하는 boost 방식의 알고리즘이고, Balanced Bagging Classifier 알고리즘은 샘플을 여러 번 뽑아 각 모델을 학습시켜 결과를 집계하는 알고리즘이다.The plurality of machine learning algorithms may include at least one of Random Forest, Logistic Regression, XGBoost Classifier, Balanced Bagging Classifier, and Ensemble. The Random Forest algorithm is a method that multiple decision trees form a forest and average each prediction result as a single result variable. The Logistic Regression algorithm applies a Sigmoid function to a linear model used for linear prediction. An algorithm that solves a classification problem. The Xgboost algorithm is a boost-type algorithm that applies the results to the next tree, while the tree of the random forest is independent. The balanced bagging classifier algorithm collects samples several times and trains each model to aggregate the results.

도 3은 본원의 다양한 실시예에 따른 선로 위험도 예측 장치에서 클러스터별로 계산한 이항분류 예측 정확도를 정리한 예시도이다.3 is an exemplary view summarizing the binomial classification prediction accuracy calculated for each cluster in the line risk prediction apparatus according to various embodiments of the present disclosure.

도 3을 참조하면, 이항분류 모형부(13)는 이항 위험수준값을 두 종류로 구분할 수 있다. 예를 들어, 이항 위험수준값은 관심, 위험으로 구분될 수 있다. 또한, 데이터셋에 저장된 클러스터 각각에 대하여 관심, 위험값을 나타낼 수 있다.Referring to FIG. 3, the binomial classification model unit 13 may classify the binomial risk level into two types. For example, binomial risk levels can be classified as attention or risk. In addition, each cluster stored in the dataset may represent an interest and a risk value.

이항분류 모형부(13)는 복수의 머신러닝 알고리즘을 상기 생성된 선로 위험예측 모델 각각에 적용함으로써, 이항분류 모형부(13)는 복수의 머신러닝 알고리즘 각각에 대응하여 선로 위험도 예측값을 산출할 수 있다. 달리 말해, 생성된 선로위험 예측 모델이 복수의 머신러닝 알고리즘에 각각에 적용됨으로써, 복수의 머신러닝 알고리즘 각각에 대응하여 머신러닝 알고리즘의 적용 결과로서 선로 위험도를 예측할 수 있다.The binomial classification model unit 13 applies a plurality of machine learning algorithms to each of the generated line risk prediction models, so that the binomial classification model unit 13 may calculate a line risk prediction value corresponding to each of the plurality of machine learning algorithms. have. In other words, the generated line risk prediction model is applied to each of the plurality of machine learning algorithms, thereby predicting the line risk as a result of the application of the machine learning algorithm corresponding to each of the plurality of machine learning algorithms.

이항분류 모형부(13)는 상기 학습 결과에 기초하여, 복수의 머신러닝 알고리즘 중 정확도와 패널티를 고려하여 최적의 머신러닝 알고리즘이 적용된 위험예측 모형을 선로위험예측 모형으로서 선정할 수 있다.The binomial classification model unit 13 may select the risk prediction model to which the optimal machine learning algorithm is applied as the line risk prediction model in consideration of the accuracy and the penalty among the plurality of machine learning algorithms based on the learning result.

다항분류 모형부(14)는 상기 데이터셋에 기초하여 신규 데이터에 대한 선로 위험도 예측을 위한 다항분류 모형을 생성할 수 있다.The polynomial classification model unit 14 may generate a polynomial classification model for predicting a line risk for new data based on the data set.

본원의 일 실시예에 따르면, 다항분류 모형부(14)는 다항 위험수준값을 종속변수로 설정하고 선정된 변수를 독립변수로 설정하여 위험예측 모델을 생성하고, 복수의 머신러닝 알고리즘을 적용하여, 생성된 상기 위험예측 모델을 학습하고, 상기 복수의 머신러닝 알고리즘에 의해 선로 위험도 예측값을 산출할 수 있다. 즉, 다항분류 모형부(14)에 의하여 생성된 위험예측 모형은 복수의 머신러닝 알고리즘을 적용함으로써 학습될 수 있다.According to one embodiment of the present application, the polynomial classification model unit 14 sets the polynomial risk level value as a dependent variable, sets the selected variable as an independent variable, generates a risk prediction model, and applies a plurality of machine learning algorithms. In addition, the generated risk prediction model may be learned, and a line risk prediction value may be calculated by the plurality of machine learning algorithms. That is, the risk prediction model generated by the polynomial classification model 14 can be learned by applying a plurality of machine learning algorithms.

여기서 복수의 머신러닝 알고리즘은 K-Nearest Neighbor(knn), Support Vector Machine(svm), XGBoost Classifier, Balanced Bagging Classifier, Adaboost 중 적어도 하나를 포함할 수 있다. K-Nearest Neighbor 알고리즘은 데이터로부터 거리가 가까운 K개의 다른 데이터의 레이블을 참조하여 분류하는 알고리즘이고 Support Vector Machine 은 각 데이터 간 거리를 측정하여 두 데이터 간 중심을 구하고 최적의 초평면(Hyper Plane)을 구함으로써 두 카테고리를 나누는 알고리즘이다. Adaboost는 이전 약한 모델 결과의 오차를 다른 약한 모델의 Weight에 반영하는 방법으로 성능을 개선하는 알고리즘이다.The plurality of machine learning algorithms may include at least one of K-Nearest Neighbor (knn), Support Vector Machine (svm), XGBoost Classifier, Balanced Bagging Classifier, and Adaboost. The K-Nearest Neighbor algorithm classifies by referring to the labels of K different data that are close to the data, and the Support Vector Machine measures the distance between each data to find the center between the two data and obtain the optimal Hyper Plane. By dividing the two categories. Adaboost is an algorithm that improves performance by reflecting the error of previous weak model result to the weight of other weak model.

도 4는 본원의 다양한 실시예에 따른 선로 위험도 예측 장치에서 클러스터별로 계산한 다항분류 예측 정확도를 정리한 예시도이다.4 is an exemplary view summarizing the accuracy of polynomial classification prediction calculated for each cluster in the track risk prediction apparatus according to various embodiments of the present disclosure.

도4를 참조하면, 다항분류 모형부(14)는 다항 위험수준값을 복수개의 항으로 구분할 수 있다. 예를 들어, 다항 위험수준값은, 주의, 경계, 심각으로 구분될 수 있다. 또한, 다항분류 모형부(14)에 의하여 생성된 선로 위험예측 모형이 복수의 머신러닝 알고리즘 각각에 적용됨으로써, 복수의 머신러닝 알고리즘 각각에 대응하여 선로 위험도 예측값이 산출될 수 있다. 달리 말해, 생성된 선로위험예측 모형이 복수의 머신러닝 알고리즘에 각각에 적용됨으로써, 복수의 머신러닝 알고리즘 각각에 대응하여 머신러닝 알고리즘의 적용 결과로서 선로 위험도를 예측할 수 있다.Referring to FIG. 4, the polynomial classification model unit 14 may divide the polynomial risk level into a plurality of terms. For example, polynomial risk level values can be divided into attention, boundary, and severity. In addition, the line risk prediction model generated by the polynomial classification model 14 is applied to each of the plurality of machine learning algorithms, so that the line risk prediction value may be calculated corresponding to each of the plurality of machine learning algorithms. In other words, the generated line risk prediction model is applied to each of the plurality of machine learning algorithms, thereby predicting the line risk as a result of the application of the machine learning algorithm corresponding to each of the plurality of machine learning algorithms.

다항분류 모형부(14)는 학습 결과에 기초하여, 복수의 머신러닝 알고리즘 중 정확도와 패널티를 고려하여 최적의 머신러닝 알고리즘이 적용된 위험예측 모형을 선로 위험예측 모형으로서 선정할 수 있다.예를 들어, K-Nearest Neighbor(knn) 알고리즘의 정확도가 70%이고, Support Vector Machine(svm) 알고리즘의 정확도가 80%이며, 두 알고리즘의 패널티가 동일한 경우, 다항분류 모형부(14)는 정확도가 높은 Support Vector Machine(svm) 알고리즘이 적용된 위험예측 모형을 선로 위험예측 모형으로서 선정할 수 있다.Based on the learning results, the polynomial classification model unit 14 may select a risk prediction model to which an optimal machine learning algorithm is applied as a line risk prediction model in consideration of accuracy and penalty among a plurality of machine learning algorithms. If the K-Nearest Neighbor (knn) algorithm is 70% accurate, the Support Vector Machine (svm) algorithm is 80% accurate, and the penalty of the two algorithms is the same, then the polynomial classification model 14 provides high-precision support. The risk prediction model to which the vector machine (svm) algorithm is applied can be selected as the line risk prediction model.

위험예측부(15)는 이항분류 모형과 다항분류 모형을 결합하여 선로 위험도를 예측할 수 있다.이 때, 위험예측부(15)는 선로 위험도를 다항 위험수준값의 복수개의 항 중 어느 하나로 예측할 수 있다. 예를 들어, 위험예측부(15)는 선로 위험도를 주의, 경계, 심각 중 하나로 예측할 수 있다.The risk prediction unit 15 may predict the line risk by combining the binomial classification model and the polynomial classification model. At this time, the risk prediction unit 15 may predict the line risk as one of a plurality of terms of the polynomial risk level value. have. For example, the risk prediction unit 15 may predict the line risk as one of caution, boundary, and severity.

본원의 일 실시예에 따르면, 위험예측부(15)는 이항분류 모형과 다항분류 모형을 더한 복합모형에서 정확도와 패널티를 고려하여 선정한 최적의 알고리즘에서 도출된 결과값을 선로위험 예측 결과로서 제공하는 것일 수 있다. 위험예측부(15)는 이항분류 모형부(13)에서 선정한 선로 위험예측 모형과 다항분류 모형부(14)에서 선정한 선로 위험예측 모형을 결합한 알고리즘을 생성하고, 상기 알고리즘에서 도출된 결과값을 선로위험 예측 결과로서 제공하는 것일 수 있다.According to an exemplary embodiment of the present application, the risk prediction unit 15 provides a result value obtained from an optimal algorithm selected in consideration of accuracy and penalty in a complex model including a binomial classification model and a polynomial classification model as a line risk prediction result. It may be. The risk prediction unit 15 generates an algorithm that combines the line risk prediction model selected by the binomial classification model unit 13 and the line risk prediction model selected by the polynomial classification model unit 14, and converts the result value derived from the algorithm into a line. It may be provided as a risk prediction result.

위험예측부(15)는 이항분류 모형과 다항분류 모형을 더한 복합모형을 기반으로 선로위험을 예측할 수 있다. 일 예로, Balanced Bagging Classifier 알고리즘에 기반하여 생성된 선로위험예측 모델을 기반으로 이항 선로 위험수준값을 예측할 수 있다. 또한, K-Nearest Neighbor 알고리즘에 기반하여 생성된 다항 선로위험 예측 모형을 기반으로 선로 위험수준값을 예측할 수 있다. 위험예측부(15)는 이항분류 모형과 다항분류 모형을 더한 복합모형에서 정확도와 패널티를 고려하여 선정한 최적의 알고리즘에서 도출된 결과값을 해당 키로정의 선로위험예측 결과로서 제공할 수 있다. The risk prediction unit 15 may predict the line risk based on a complex model including a binomial classification model and a polynomial classification model. For example, a binary risk level value can be predicted based on a track risk prediction model generated based on a balanced bagging classifier algorithm. In addition, it is possible to predict the track risk level value based on the multinomial track risk prediction model generated based on the K-Nearest Neighbor algorithm. The risk prediction unit 15 may provide the result value of the optimal algorithm selected from the optimal algorithm selected by considering the accuracy and penalty in the complex model including the binomial classification model and the polynomial classification model as the line risk prediction result of the corresponding key definition.

본원의 선로 위험도 예측장치는, 사고장애 데이터와 선로결함을 연계함으로써 데이터셋내에 빅데이터로 구성하여 저장할 수 있다. 또한, 본원은 머신러닝 알고리즘으로 하여금 선로 위험도를 예측할 수 있다. 다시 말해, 본원은 빅데이터 기반의 머신러닝 알고리즘을 이용하여 선로의 위험도를 예측함으로써, 보다 정확하고 향상된 예측률을 달성할 수 있다.The track risk predicting apparatus of the present application may be configured to store big data in a data set by linking accident failure data with line defects. In addition, the present application can be machine learning algorithm to predict the line risk. In other words, the present application can predict the risk of the track using a big data-based machine learning algorithm, thereby achieving a more accurate and improved prediction rate.

도 5는 본원의 다양한 실시예에 따른 선로 위험도 예측 방법에 대한 동작 흐름도이다.5 is a flowchart illustrating a method for predicting a track risk according to various embodiments of the present disclosure.

도 5에 도시된 선로 위험도 예측 방법은 앞서 설명된 선로 위험도예측 장치(10)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 선로 위험도 예측 장치(10)에 대하여 설명된 내용은 선로 위험도 예측방법에 대한 설명에도 동일하게 적용될 수 있다.The track risk predicting method shown in FIG. 5 may be performed by the track risk predicting apparatus 10 described above. Therefore, even if omitted below, the content described with respect to the line risk predicting apparatus 10 may be equally applicable to the description of the line risk predicting method.

도 5를 참조하면, 선로 위험도 예측 방법은 선로 위험도 예측 장치에서, 선로 위험도 예측의 대상이 되는 신규 데이터를 수집할 수 있다. (S100)Referring to FIG. 5, the line risk prediction method may collect new data that is a target of line risk prediction in the line risk prediction device. (S100)

신규 데이터는 사고장애데이터, 선로결함, 유지보수데이터, 기상데이터, 및 시설물 데이터 중 적어도 하나와 관련된 선로 관련 데이터일 수 있다.The new data may be track related data associated with at least one of accident failure data, track defects, maintenance data, weather data, and facility data.

다음으로, 단계S100에서 수집된 신규 데이터의 전처리를 수행할 수 있다. (S200) Next, preprocessing of the new data collected in step S100 may be performed. (S200)

또한, 단계S200에서는 상기 데이터 수집부(11)가 수집한 데이터를 탐색하여 분석할 데이터를 획득하고, 획득한 데이터로부터 분석을 수행할 분석 변수를 선택하며, 선택한 변수에 대응되는 데이터의 전처리를 수행할 수 있다.In operation S200, the data collected by the data collecting unit 11 searches for data to be analyzed, selects an analysis variable to be analyzed from the acquired data, and performs preprocessing of data corresponding to the selected variable. can do.

단계S200에서 전처리는 Robust 방법을 이용하여 중앙값과 IQR(interquartile range)을 사용하여 이상치의 영향을 최소화하며 데이터의 변수값을 정규화하는 것일 수 있다.In step S200, the preprocessing may be to normalize the variable values of the data while minimizing the influence of the outliers using the median value and the interquartile range (IQR) using the robust method.

다음으로, 사고장애 데이터, 선로결함, 유지보수 데이터, 기상 데이터, 시설물 데이터 중 적어도 하나를 포함하는 데이터셋에 기초하여 상기 신규 데이터에 대한 선로 위험도 예측을 위한 이항분류 모형을 생성할 수 있다. (S300)Next, a binomial classification model for predicting a line risk for the new data may be generated based on a data set including at least one of accident failure data, track defects, maintenance data, weather data, and facility data. (S300)

이 때, 데이터셋은 복수의 레코드를 포함하고, 상기 레코드는 사고장애데이터, 선로결함, 유지보수 데이터, 기상데이터, 시설물 데이터를 일자, 키로정 및 상하구분 코드를 기준으로 연결하는 것일 수 있다.At this time, the data set includes a plurality of records, the record may be to connect the accident failure data, track defects, maintenance data, weather data, facility data based on the date, keystroke and vertical division code.

단계 S300에서 이항 위험수준값을 종속변수로 설정하고 선정된 변수를 독립변수로 설정하여 위험예측 모델을 생성하고, 복수의 머신러닝 알고리즘을 적용하여, 생성된 상기 위험예측 모델을 학습할 수 있다. 또한, 단계 S300에서 상기 복수의 머신러닝 알고리즘에 의해 선로 위험도 예측값을 산출할 수 있다.In step S300, the risk prediction model may be generated by setting the binomial risk level value as a dependent variable and setting the selected variable as an independent variable, and applying the plurality of machine learning algorithms to learn the generated risk prediction model. In operation S300, a line risk prediction value may be calculated by the plurality of machine learning algorithms.

이 때, 상기 복수의 머신러닝 알고리즘은 Random Forest, Logistic Regression, XGBoost Classifier, Balanced Bagging Classifier, 앙상블(Ensemble) 중 적어도 하나를 포함할 수 있다.In this case, the plurality of machine learning algorithms may include at least one of Random Forest, Logistic Regression, XGBoost Classifier, Balanced Bagging Classifier, and Ensemble.

다음으로, 상기 데이터셋에 기초하여 신규 데이터에 대한 선로 위험도 예측을 위한 다항분류 모형을 생성할 수 있다. (S400)Next, a polynomial classification model for predicting line risk for new data may be generated based on the data set. (S400)

단계 S400에서 다항 위험수준값을 종속변수로 설정하고 선정된 변수를 독립변수로 설정하여 위험예측 모델을 생성하고, 복수의 머신러닝 알고리즘을 적용하여, 생성된 상기 위험예측 모델을 학습할 수 있다. 또한, 상기 복수의 머신러닝 알고리즘에 의해 선로 위험도 예측값을 산출하는 단계를 포함할 수 있다.In step S400, the risk prediction model may be generated by setting the polynomial risk level value as the dependent variable and the selected variable as the independent variable, and applying the plurality of machine learning algorithms to learn the generated risk prediction model. The method may further include calculating a line risk prediction value by the plurality of machine learning algorithms.

이 때, 상기 복수의 머신러닝 알고리즘은 K-Nearest Neighbor(knn), Support Vector Machine(svm), XGBoost Classifier, Balanced Bagging Classifier, Adaboost 중 적어도 하나를 포함할 수 있다.In this case, the plurality of machine learning algorithms may include at least one of K-Nearest Neighbor (knn), Support Vector Machine (svm), XGBoost Classifier, Balanced Bagging Classifier, and Adaboost.

다음으로, 상기 이항분류 모형과 상기 다항분류 모형을 결합하여 선로 위험도를 예측할 수 있다. (S500)Next, the line risk may be predicted by combining the binomial classification model and the polynomial classification model. (S500)

단계 S500에서, 이항분류 모형과 다항분류 모형을 결합하여 선로 위험도를 예측하고, 이항분류 모형과 다항분류 모형을 더한 복합모형에서 정확도와 패널티를 고려하여 선정한 최적의 알고리즘에서 도출된 결과값을 선로 위험도 예측 결과로서 제공할 수 있다.In step S500, the line risk is predicted by combining the binomial classification model and the polynomial classification model, and the resultant value derived from the optimal algorithm selected by considering the accuracy and penalty in the complex model including the binomial classification model and the polynomial classification model. Can be provided as a prediction result.

상술한 설명에서, 단계 S100 내지 S500은 본원의 구현 예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S100 to S500 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present disclosure. In addition, some steps may be omitted as necessary, and the order between the steps may be changed.

본원의 일 실시 예에 따른 선로 위험도 예측 장치에 의해 수행되는 선로 위험도 예측 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The track risk predicting method performed by the track risk predicting apparatus according to an exemplary embodiment of the present disclosure may be implemented in the form of program instructions that may be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

또한, 전술한 선로 위험도 예측 장치에 의해 수행되는 선로 위험도 예측 방법은 기록 매체에 저장되는 컴퓨터에 의해 실행되는 컴퓨터 프로그램 또는 애플리케이션의 형태로도 구현될 수 있다.In addition, the above-mentioned line risk prediction method performed by the above-described line risk predicting apparatus may be implemented in the form of a computer program or an application executed by a computer stored in a recording medium.

전술한 본원의 설명은 예시를 위한 것이며, 본원이 속하는 기술분야의 통상의 지식을 가진 자는 본원의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present application is intended for illustration, and it will be understood by those skilled in the art that the present invention may be easily modified in other specific forms without changing the technical spirit or essential features of the present application. Therefore, it should be understood that the embodiments described above are exemplary in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본원의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본원의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present application is indicated by the following claims rather than the above description, and it should be construed that all changes or modifications derived from the meaning and scope of the claims and their equivalents are included in the scope of the present application.

10: 선로 위험도 예측 장치
11: 데이터 수집부
12: 데이터 전처리부
13: 이항분류 모형부
14: 다항분류 모형부
15: 위험예측부10: Track risk prediction device
11: data collector
12: data preprocessor
13: Binomial classification model
14: polynomial classification model
15: Risk Prediction

Claims

In the track risk prediction method by the track risk prediction device,
Collecting new data subject to track risk prediction;
Performing preprocessing of the new data;
Based on the data set including at least one of accident failure data, line defect, maintenance data, weather data, and facility data, the binomial risk level value is set as a dependent variable, and the selected variable is set as an independent variable. Generating a binomial classification model for predicting track risk for the vehicle;
Generating a polynomial classification model for predicting a line risk for new data by setting a polynomial risk level value as a dependent variable and setting the selected variable as an independent variable based on the data set;
Learning the binomial classification model and the polynomial classification model by applying a plurality of machine learning algorithms to the generated binomial classification model and the polynomial classification model;
Selecting a binomial classification model and a polynomial classification model to which an optimal machine learning algorithm is applied in consideration of accuracy and penalty among the plurality of machine learning algorithms based on the learning result as a line risk prediction model;
Predicting a line risk by combining the selected binomial classification model and the polynomial classification model;
The dataset comprises a plurality of records,
The record is generated by connecting accident failure data, line defects, maintenance data, weather data, and facility data based on date, key, top and bottom codes,
Predicting the line risk,
And generating a complex model algorithm combining the selected binomial classification model and the polynomial classification model, and providing a result value derived from the algorithm of the complex model as a track risk prediction result.

The method of claim 1,
And the new data is track-related data associated with at least one of accident failure data, track defects, maintenance data, weather data, and facility data.

delete

The method of claim 1,
Performing preprocessing of the new data,
Acquiring the data to be analyzed by searching the collected data in the step of collecting the new data,
Selecting an analysis variable to be analyzed from the acquired data and performing preprocessing of data corresponding to the selected variable.

The method of claim 4, wherein
Performing preprocessing of the new data,
Robust method to predict the risk of the line using the median and interquartile range (IQR) to minimize the impact of outliers and to normalize the variable values in the data.

delete

The method of claim 1,
The plurality of machine learning algorithms include at least one of Random Forest, Logistic Regression, XGBoost Classifier, Balanced Bagging Classifier, Ensemble,
Track risk prediction method.

The method of claim 7, wherein
Generating the polynomial classification model,
Generating a risk prediction model by setting a polynomial risk level value as a dependent variable and setting the selected variable as an independent variable;
Learning a generated risk prediction model by applying a plurality of machine learning algorithms;
Calculating a line risk prediction value using the plurality of machine learning algorithms;
Track risk prediction method comprising a.

The method of claim 8,
The plurality of machine learning algorithms include at least one of K-Nearest Neighbor (knn), Support Vector Machine (svm), XGBoost Classifier, Balanced Bagging Classifier, Adaboost,
Track risk prediction method.

delete

A data collection unit collecting new data that is a target of track risk prediction;
A data preprocessor configured to preprocess the data collected by the data collector;
Based on the data set including at least one of accident failure data, line defect, maintenance data, weather data, and facility data, the binomial risk level value is set as the dependent variable, and the selected variable is set as the independent variable. A binomial classification model unit for generating a binomial classification model for predicting track risk;
A polynomial classification model unit for generating a polynomial classification model for predicting a line risk for new data by setting a polynomial risk level value as a dependent variable and setting the selected variable as an independent variable based on the data set;
Including a risk prediction unit for predicting the line risk by combining the binomial classification model and polynomial classification model,
The dataset comprises a plurality of records,
The record is generated by connecting accident failure data, line defects, maintenance data, weather data, and facility data based on date, key path, and up / down code.
The binomial classification model unit and the polynomial classification model unit,
A plurality of machine learning algorithms are applied to the generated binomial classification model and the polynomial classification model to learn the binomial classification model and the polynomial classification model, and based on the learning results, accuracy and penalty among the plurality of machine learning algorithms are calculated. Considering the binomial classification model and the polynomial classification model to which the optimal machine learning algorithm is applied as the line risk prediction model,
The risk prediction unit,
And generating a complex model algorithm combining the binomial classification model and the polynomial classification model, and providing a result value derived from the algorithm of the complex model as a track risk prediction result.

A computer-readable recording medium having recorded thereon a program for executing the method of any one of claims 1, 2, 4, 5 and 7-9.