KR20180086602A

KR20180086602A - Apparatus and method for estimating traffic jam area based on machine learning

Info

Publication number: KR20180086602A
Application number: KR1020170010225A
Authority: KR
Inventors: 김경섭; 전우혁; 박경빈; 최지인
Original assignee: 충남대학교산학협력단
Priority date: 2017-01-23
Filing date: 2017-01-23
Publication date: 2018-08-01

Abstract

The present invention relates to an apparatus for estimating a traffic jam area based on machine learning and a method thereof. The apparatus includes a data collection part for collecting at least one of weather data, traffic data and road location data, a merged data generating part for merging the traffic data, the road location data, and the weather data to generate merged data, and a machine learning part which learns the merged data based on the machine learning and generates a decision tree model for predicting a traffic jam area for each weather.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an apparatus and method for estimating a traffic congestion section using machine learning,

본 발명은 기계학습을 이용하여 교통 정체 구간을 추정하기 위한 기계학습을 이용한 교통 정체 구간 추정 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for estimating a traffic congestion section using machine learning for estimating a traffic congestion section using machine learning.

최근 정보통신기술의 발달로 등장한 "빅데이터"는 기존의 소프트웨어로는 저장, 관리, 분석이 어려운 데이터라고 정의할 수 있으며, "빅데이터"의 활용을 통해 사회 현안 및 사용자의 니즈 파악, 미래전략 수립, 선제적 공공서비스 제공 등 각종 서비스가 제공될 수 있다.Big Data, which emerged as a result of recent advances in information and communication technology, can be defined as data that is difficult to store, manage and analyze with existing software. Through the use of "Big Data" And provision of pre-emptive public services.

이러한 "빅데이터" 중 기상기후 빅데이터는 기상 기후에 관련된 각종 정보를 포함하는 "빅데이터"로 이러한 기상기후 빅데이터는 기상자료의 분석, 미래 기상 정보 예측, 에너지 관광, 날씨보험과 같은 타 산업과의 융합을 통해 신 사업 창출이 가능한 날씨에 관한 빅데이터를 의미할 수 있다.Big weather data of these "big data" is "big data" which includes various information related to the weather weather. Big weather data such as analysis of weather data, prediction of future weather information, energy tourism, weather insurance and other industries Can be big data about the weather that can create a new business through the fusion with.

이러한, 기상기후 빅데이터는 개인의 신상을 바탕으로 한 정보가 아니기 때문에 개인 정보 보호의 문제에서 자유로울 수 있으며, 이러한 기상기후 빅데이터와 사회경제적 자료를 접목하면 시너지 효과를 발휘할 수 있다.Such weather data is not based on personal information, so it can be free from the problem of personal information protection. Combining such weather data with socio-economic data can bring synergy effects.

교통 정체는 사고나 공사, 날씨 등 많은 요인들의 영향을 받게 되며, 더 나아가 사고는 운전자의 부주의로 인한 것일지라도 날씨의 영향을 종종 받는 경우도 있으나, 종래에는 날씨가 교통에 얼마나 영향을 미치는지에 대해 효과적으로 알 수가 없는 문제가 있다.Traffic congestion is influenced by many factors such as accidents, construction, weather, etc. Furthermore, accidents are often caused by the carelessness of the driver and sometimes influenced by the weather, but conventionally the influence of weather on traffic There is a problem that can not be understood effectively.

한국 등록특허공보 제10-0695918호(2007.03.09.)Korean Registered Patent No. 10-0695918 (Mar.

본 발명의 목적은, 상기 문제점을 해결하기 위한 것으로 기상 데이터 및 교통 데이터를 연계하여 기계학습함으로써 날씨별 교통 정체 구간을 예측하는 결정 트리 모델을 생성하여, 교통 정체 구간을 추정하기 위한 것이다.SUMMARY OF THE INVENTION An object of the present invention is to solve the above problem by generating a decision tree model for predicting a traffic congestion period by weather by linking meteorological data and traffic data to machine learning to estimate a traffic congestion period.

본 발명이 해결하고자 하는 과제는 이상에서 언급한 과제(들)로 제한되지 않으며, 언급되지 않은 또 다른 과제(들)은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the above-mentioned problem (s), and another problem (s) not mentioned can be clearly understood by those skilled in the art from the following description.

상기한 목적을 달성하기 위하여, 본 발명의 실시예에 따른 기계학습을 이용한 교통 정체 구간 추정 장치는 기상 데이터, 교통 데이터 및 도로 위치 데이터 중 적어도 하나의 데이터를 수집하는 데이터 수집부, 교통 데이터, 도로 위치 데이터 및 기상 데이터를 병합하여 병합 데이터를 생성하는 병합 데이터 생성부 및 병합 데이터를 기계학습에 기초하여 학습하여 날씨별 교통 정체 구간을 예측하는 결정 트리 모델(Decision Tree Model)을 생성하는 기계학습부를 포함한다.In order to accomplish the above object, a traffic congestion section estimating apparatus using machine learning according to an embodiment of the present invention includes a data collecting unit for collecting at least one of weather data, traffic data, and road position data, traffic data, A merging data generating unit for merging the position data and the weather data to generate merged data, and a machine learning unit for learning the merged data based on the machine learning and generating a decision tree model for predicting the traffic congestion period by weather .

본 발명의 일 실시예에 따르면, 교통 정체 구간을 추정하기 위하여 기상 데이터 및 교통 데이터를 동시에 활용하기 때문에, 보다 정확한 날씨별 교통 정체 구간을 추정할 수 있다.According to an embodiment of the present invention, since the weather data and the traffic data are used simultaneously for estimating the traffic congestion period, more accurate traffic congestion intervals for each weather can be estimated.

도 1은 본 발명의 실시예에 따른, 교통 정체 구간 추정 장치를 설명하기 위한 구성도이다.
도 2는 본 발명의 실시예에 따른, 교통 정체 구간 추정 장치가 활용하는 기상 데이터를 설명하기 위한 도면이다.
도 3은 본 발명의 실시예에 따른, 교통 정체 구간 추정 장치가 활용하는 교통 데이터를 설명하기 위한 도면이다.
도 4는 본 발명의 실시예에 따른, 교통 정체 구간 추정 장치가 활용하는 도로 위치 데이터를 설명하기 위한 도면이다.
도 5는 본 발명의 실시예에 따른, 교통 정체 구간 추정 장치에 포함되는 병합 데이터 생성부를 설명하기 위한 도면이다.
도 6 내지 도 8은 본 발명의 실시예에 따른, 교통 정체 구간 추정 장치에 포함되는 기계학습부를 설명하기 위한 도면이다.1 is a block diagram illustrating a traffic congestion section estimating apparatus according to an embodiment of the present invention.
2 is a view for explaining weather data utilized by the traffic congestion section estimating apparatus according to the embodiment of the present invention.
3 is a view for explaining traffic data utilized by the traffic congestion section estimating apparatus according to the embodiment of the present invention.
4 is a view for explaining road position data utilized by the traffic congestion section estimating apparatus according to the embodiment of the present invention.
5 is a view for explaining a merged data generating unit included in the traffic congestion interval estimating apparatus according to the embodiment of the present invention.
6 to 8 are views for explaining a machine learning unit included in the traffic congestion section estimating apparatus according to the embodiment of the present invention.

이하, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 정도로 상세히 설명하기 위하여, 본 발명의 가장 바람직한 실시예를 첨부 도면을 참조하여 설명하기로 한다. 우선 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings in order to facilitate a person skilled in the art to easily carry out the technical idea of the present invention. . In the drawings, the same reference numerals are used to designate the same or similar components throughout the drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear.

이하, 본 발명의 실시예에 따른 기계학습을 이용한 교통 정체 구간 추정 장치 및 방법을 첨부된 도면을 참조하여 상세하게 설명하면 아래와 같다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, an apparatus and method for estimating a traffic congestion section using machine learning according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1에 도시된 바와 같이, 본 발명의 실시예에 따른 기계학습을 이용한 교통 정체 구간 추정 장치(100)는 데이터 수집부(110), 병합 데이터 생성부(120) 및 기계학습부(130)를 포함한다.1, an apparatus 100 for estimating traffic congestion using machine learning according to an embodiment of the present invention includes a data collecting unit 110, a merge data generating unit 120, and a machine learning unit 130 .

데이터 수집부(110)는 기상 데이터, 교통 데이터 및 도로 위치 데이터 중 적어도 하나의 데이터를 수집한다.The data collecting unit 110 collects at least one of weather data, traffic data, and road location data.

병합 데이터 생성부(120)는 교통 데이터, 도로 위치 데이터 및 기상 데이터를 병합하여 병합 데이터를 생성한다.The merged data generation unit 120 merges the traffic data, the road position data, and the weather data to generate merged data.

기계학습부(130)는 병합 데이터를 기계학습에 기초하여 학습하여 날씨별 교통 정체 구간을 예측하는 결정 트리 모델(Decision Tree Model)을 생성한다.The machine learning unit 130 learns the merged data based on the machine learning, and generates a decision tree model for predicting a traffic congestion period by weather.

예를 들어, 본 발명의 실시예에 따른 기계학습을 이용한 교통 정체 구간 추정 장치(100)는 기상 데이터와 교통 데이터를 수집한 후, 분산 시스템 환경 하에서 실행되는 기계학습 알고리즘을 이용하여 기상기후와 관련된 교통 정체 구간을 추정하는 장치를 의미할 수 있다.For example, the apparatus 100 for estimating a traffic congestion zone using machine learning according to an embodiment of the present invention collects weather data and traffic data, and then calculates a traffic congestion interval using machine learning algorithms executed in a distributed system environment It may mean a device for estimating a traffic congestion period.

다시 말해, 본 발명의 실시예에 따른 기계학습을 이용한 교통 정체 구간 추정 장치(100)는 대량의 날씨 데이터, 교통 데이터를 분석하여, 날씨에 따른 주요 도로의 교통 흐름이 어떻게 될 것인지 예측할 수 있다.In other words, the traffic congestion section estimating apparatus 100 using machine learning according to the embodiment of the present invention analyzes a large amount of weather data and traffic data, and predicts how traffic flows on the main road depending on the weather.

본 발명의 실시예에 따른 기계학습을 이용한 교통 정체 구간 추정 장치(100)는 대량의 날씨 데이터와 교통 데이터를 분석하고 예측하기 위해서 통계 분석 툴과 분산 처리 시스템을 활용할 수 있다.The traffic congestion section estimating apparatus 100 using the machine learning according to the embodiment of the present invention can utilize a statistical analysis tool and a distributed processing system to analyze and predict a large amount of weather data and traffic data.

일 실시예에 따르면, 기계학습을 이용한 교통 정체 구간 추정 장치(100)는 통계 소프트웨어 개발과 자료 분석에 널리 사용되고 있는 처리 시스템의 대표적인 솔루션 하둡(Hadoop)을 사용할 수 있으며, 하둡 위에 스파크(Spark), 맵 리듀스(MapReduce), 하둡 분산 파일 시스템(Hadoop Distributed File System, HDFS), 하둡 플랫폼을 위한 공개 비관계형 분산 데이터 베이스인 H베이스(HBASE), 하둡 2.0인 얀(Yet, Another Resource Negotiator, YARN)를 설치하여 대량의 데이터를 효과적으로 분석할 수 있다.According to one embodiment, the traffic congestion section estimating apparatus 100 using the machine learning can use a representative solution of a processing system widely used for statistical software development and data analysis, such as Hadoop, MapReduce, Hadoop Distributed File System (HDFS), Hadoop (HBASE), Open Relational Distributed Database for Hadoop platform, Yet Another Resource Negotiator (YARN) To efficiently analyze a large amount of data.

예컨대, 기계학습을 이용한 교통 정체 구간 추정 장치(100)는 분산 처리 시스템 상에서 대용량의 데이터를 안정적으로 저장하기 위해 하둡 분산 파일 시스템을 사용하고, H베이스를 통해 분산 시스템 상태를 관리할 수 있으며, 맵리듀스, 얀을 이용해 분산 데이터를 처리하고, 스파크의 기계학습을 이용해서 데이터를 분석할 수 있다.For example, the traffic congestion section estimating apparatus 100 using the machine learning can use the Hadoop distributed file system to stably store a large amount of data on the distributed processing system, manage the distributed system state through the H base, You can use the deuce and yarn to process the distributed data and the spark machine learning to analyze the data.

예컨대, 아파치 하둡(Apache Hadoop, High-Availability Distributed Object-Oriented Platform)은 대량의 자료를 처리할 수 있는 큰 컴퓨터 클러스터에서 동작하는 분산 응용 프로그램을 지원하는 프리웨어 자바 소프트웨어 프레임워크이며, 너치의 분산 처리를 지원하기 위해 개발된 것으로, 아파치 루씬의 하부 프로젝트를 의미할 수 있으며, 아파치 하둡은 분산처리 시스템인 구글 파일 시스템을 대체할 수 있는 하둡 분산 파일 시스템(HDFS: Hadoop Distributed File System)과 맵 리듀스(Map Reduce)를 구현한다.For example, Apache Hadoop (High-Availability Distributed Object-Oriented Platform) is a freeware Java software framework that supports distributed applications running on large computer clusters capable of processing large amounts of data. Apache Hadoop is a subproject of Apache Lucene, which is designed to support Hadoop Distributed File System (HDFS) and MapReduce (Hadoop Distributed File System), which can replace the Google file system, Map Reduce.

하둡 분산 파일 시스템(HDFS, Hadoop distributed file system)은 하둡 프레임워크를 위해 자바 언어로 작성된 분산 확장 파일 시스템으로, 하둡 분산 파일 시스템은 여러 기계에 대용량 파일들을 나눠서 저장하고, 데이터들을 여러 서버에 중복해서 저장을 함으로써 데이터 안정성을 얻는다.The Hadoop distributed file system (HDFS) is a distributed, extensible file system written in the Java language for the Hadoop framework. The Hadoop distributed file system divides and stores large files on multiple machines, By storing, data stability is obtained.

하둡 분산 파일 시스템을 채용하는 경우 호스트에 레이드(RAID) 저장장치를 사용하지 않아도 되며, 하둡 분산 파일 시스템은 마스터/슬레이브(master/slave) 구조를 가지되, 하둡 분산 파일 시스템의 클러스터는 하나의 네임노드와, 파일 시스템을 관리하고 클라이언트의 접근을 통제하는 마스터 서버로 구성되며, 클러스터의 각 노드에는 데이터 노드가 하나씩 존재하고, 이 데이터 노드는 실행될 때마다 노드에 추가되는 스토리지를 관리한다.The Hadoop distributed file system eliminates the need to use RAID (RAID) storage on the host, the Hadoop distributed file system has a master / slave structure, and the Hadoop distributed file system cluster has one name And a master server that manages the file system and controls access to the client. Each node in the cluster has one data node, which manages the storage added to the node each time it is executed.

맵리듀스(Map Reduce)는 구글에서 대용량 데이터 처리를 분산 병렬 컴퓨팅에서 처리하기 위한 목적으로 제작하여 2004년 발표한 소프트웨어 프레임워크로, 맵리듀스는 페타바이트 이상의 대용량 데이터를 신뢰도가 낮은 컴퓨터로 구성된 클러스터 환경에서 병렬 처리를 지원하기 위해서 개발되었으며, 맵리듀스는 함수형 프로그래밍에서 일반적으로 사용되는 맵(Map)과 리듀스(Reduce)라는 함수 기반으로 주로 구성된다.Map Reduce is a software framework that was created in 2004 for the purpose of processing large amounts of data in Google distributed parallel computing. MapReduce is a software framework for large-scale petabytes of data, , And MapReduce is mainly composed of functions such as Map and Reduce which are commonly used in functional programming.

아파치 스파크는 범용적 목적의 분산 고성능 클러스터링 플랫폼(General purpose high performance distributed platform)으로, 캘리포니아 버클리 대학의 AMPLab에서 개발 된 오픈 소스 클러스터 컴퓨팅 프레임워크로 후에 아파치 소프트웨어 재단에 기증되어 현재 아파치 재단에서 관리되고 있다.Apache Spark is a general purpose high performance distributed platform, an open source cluster computing framework developed by AMPLab at the University of California at Berkeley, later donated to the Apache Software Foundation and currently managed by the Apache Foundation. .

하둡이 맵리듀스(MapReduce) 작업을 디스크 기반으로 수행해서 느려지는 것과 대조적으로 스파크는 메모리 기반으로 옮겨서 빠른 성능을 제공하며, 스파크의 주요기능으로 맵리듀스(MapReduce), 스트리밍(Streaming) 데이터 핸들링, SQL 기반의 데이터 쿼리, 머신러닝 라이브러리, 그래프 데이터 프로세싱 등이 있다.In contrast to Hadoop's slow implementation of MapReduce operations on a disk-based basis, Spark moves to memory-based and provides fast performance. MapReduce, Streaming data handling, SQL Based data queries, machine learning libraries, and graph data processing.

이제 도 2 내지 도 4를 참조하여, 데이터 수집부(110)가 수집하는 기상 데이터, 교통 데이터 및 도로 위치 데이터의 실시예를 설명한다.2 to 4, an embodiment of weather data, traffic data and road position data collected by the data collecting unit 110 will be described.

도 2는 본 발명의 실시예에 따른, 교통 정체 구간 추정 장치가 활용하는 기상 데이터를 설명하기 위한 도면이다. 도 3은 본 발명의 실시예에 따른, 교통 정체 구간 추정 장치가 활용하는 교통 데이터를 설명하기 위한 도면이다. 도 4는 본 발명의 실시예에 따른, 교통 정체 구간 추정 장치가 활용하는 도로 위치 데이터를 설명하기 위한 도면이다.2 is a view for explaining weather data utilized by the traffic congestion section estimating apparatus according to the embodiment of the present invention. 3 is a view for explaining traffic data utilized by the traffic congestion section estimating apparatus according to the embodiment of the present invention. 4 is a view for explaining road position data utilized by the traffic congestion section estimating apparatus according to the embodiment of the present invention.

예컨대, 데이터 수집부(110)가 수집하는 기상 데이터는 도 2에 도시된 바와 같을 수 있으며, 지점, 일시, 평균기온, 최저기온, 최저기온 시각, 최고기온, 일 강수량 등의 기상데이터를 CSV 파일의 형태로 수집한 데이터를 의미할 수 있다.For example, the weather data collected by the data collection unit 110 may be as shown in FIG. 2, and weather data such as a point, a date and time, an average temperature, a minimum temperature, a minimum temperature, And the like.

예컨대, 데이터 수집부(110)가 수집하는 교통 데이터는 도 3에 도시된 바와 같을 수 있으며, 소통정보 데이터를 CSV 파일 형태로 수집한 데이터를 의미할 수 있다.For example, the traffic data collected by the data collecting unit 110 may be as shown in FIG. 3, and may mean data collected in the form of a CSV file.

예컨대, 데이터 수집부(110)가 수집하는 도로 위치 데이터는 도 4에 도시된 바와 같을 수 있으며, 표준노드링크에서 도로 위치를 CSV 파일 형태로 수집한 데이터를 의미할 수 있다.For example, the road position data collected by the data collecting unit 110 may be as shown in FIG. 4, and may refer to data obtained by collecting road locations in a CSV file form on a standard node link.

이때, 데이터 수집부(110)가 수집한 각각의 데이터 형태는 CSV 파일 형태인 것으로 설명되었으나, 본 발명은 이에 한정되지 않는다.In this case, although each data type collected by the data collection unit 110 is described as a CSV file, the present invention is not limited thereto.

이제 도 5를 참조하여, 병합 데이터 생성부(120)를 설명한다.Referring to FIG. 5, the merged data generation unit 120 will be described.

도 5는 본 발명의 실시예에 따른, 교통 정체 구간 추정 장치에 포함되는 병합 데이터 생성부를 설명하기 위한 도면이다.5 is a view for explaining a merged data generating unit included in the traffic congestion interval estimating apparatus according to the embodiment of the present invention.

병합 데이터 생성부(120)는 기계학습부(130)가 결정 트리 모델(Decision Tree Model)을 생성하기 위하여, 데이터 구축을 위한 각종 데이터들과 그 데이터들을 효율적으로 정리, 축소하기 위해서 기상 데이터, 교통 데이터 및 도로 위치 데이터를 병합하여 병합 데이터를 생성함으로써 전처리한다.The merge data generation unit 120 generates a merge data model 120 by using the machine learning unit 130 to generate various types of data for data construction and data Data and road position data are merged to generate merged data.

예컨대, 병합 데이터 생성부(120)는 각 지역별로 데이터를 전처리를 하기 위해서 날씨 데이터를 여러 대도시(고속도로) 별로 나누고, 정체구간의 시간과 날씨 데이터에서의 시간을 하루치로 통합한다.For example, the merge data generator 120 divides the weather data into a plurality of large cities (highways) to pre-process data for each region, and integrates the time of the congestion section and the time of the weather data into one day.

예를 들어, 날씨 데이터는 하루당 강수량, 일조량, 온도 등 데이터가 포맷 되어 있지만 교통 데이터는 5분당 교통 정보를 담고 있으므로 병합 데이터 생성부(120)는 이들 데이터를 병합하기 위하여 전처리할 수 있다.For example, since the weather data is formatted such as the amount of precipitation per day, the amount of sunshine, temperature, and the like, traffic data includes traffic information per 5 minutes, so that the merge data generator 120 can perform preprocessing to merge these data.

예컨대, 병합 데이터 생성부(120)의 전처리 방식은 도 5에 도시된 바와 같으며 병합 데이터 생성부(120)는 교통 데이터와 각 지역별 Road ID 값 리스트를 포함하는 도로 위치 데이터를 먼저 통합하고 통합한 데이터와 날씨 데이터를 다시 날짜별로 통합하여 병합 데이터를 생성할 수 있다.For example, the preprocessing method of the merged data generation unit 120 is as shown in FIG. 5, and the merged data generation unit 120 integrates and integrates the road location data including the traffic data and the road ID list of each region Data and weather data can be merged again by date to generate merged data.

이제 도 6 내지 도 8을 참조하여 기계학습부(130)를 설명한다.The machine learning unit 130 will now be described with reference to FIGS.

도 6 내지 도 8은 본 발명의 실시예에 따른, 교통 정체 구간 추정 장치에 포함되는 기계학습부를 설명하기 위한 도면이다.6 to 8 are views for explaining a machine learning unit included in the traffic congestion section estimating apparatus according to an embodiment of the present invention.

기계학습부(130)는 날씨 데이터와 교통 데이터를 통합한 병합 데이터를 이용하여 의사 결정 트리(Decision Tree) 모델을 통한 날씨에 따른 소통속도와 교통사고를 예측하는 모델을 생성할 수 있으며, 기계학습부(130)는 이러한 예측 기법을 사용하기 위해서 프로그램 R에서 제공하는 C5.0 라이브러리에 포함된 함수인 Decision Tree를 사용할 수 있으며, C5.0 라이브러리는 트리 생성 알고리즘에서 정보획득량을 사용할 수 있으나, 본 발명은 이에 한정되지 않는다.The machine learning unit 130 can generate a model for predicting the traffic speed and traffic accident according to the weather through the decision tree model using the merged data including the weather data and the traffic data, In order to use this prediction technique, the unit 130 can use a Decision Tree, which is a function included in the C5.0 library provided by the program R. The C5.0 library can use the information acquisition amount in the tree generation algorithm, The present invention is not limited thereto.

결정 트리 학습법(decision tree learning)은 어떤 항목에 대한 관측값과 목표값을 연결시켜주는 예측 모델로써 결정 트리를 사용하며, 이러한 결정 트리는 시각적이고 명시적인 방법으로 의사 결정 과정과 결정된 의사를 보여주는데 사용될 수 있다.Decision tree learning uses a decision tree as a predictive model that connects observation values and target values for an item. This decision tree can be used to show the decision process and the determined decision in a visual and explicit way. have.

예컨대, 결정 트리 학습법은 데이터 마이닝에서 사용되는 방법으로, 몇몇 입력 변수를 바탕으로 목표 변수의 값을 예측하는 모델을 생성하는 것을 목표로 하며, 결정 트리 학습법은 트리 구조에서, 각 내부 노드들은 하나의 입력 변수에, 자녀 노드들로 이어지는 가지들은 입력 변수의 가능한 값에 대응되고, 잎 노드는 각 입력 변수들이 루트 노드로부터 잎 노드로 이어지는 경로에 해당되는 값들을 가질 때의 목표 변수 값에 해당된다.For example, the decision tree learning method is a method used in data mining. It aims at creating a model for predicting the value of a target variable based on some input variables. In the decision tree learning method, In the input variable, the branches leading to the child nodes correspond to the possible values of the input variable, and the leaf nodes correspond to the target variable values when the respective input variables have values corresponding to the path from the root node to the leaf node.

정보획득량과 관련하여, 주어진 데이터 집합의 혼잡도를 엔트로피로 정의할 때, 가장 혼잡도가 높은 상태는 1, 하나의 클래스로만 구성된 상태는 0으로 나타낼 수 있으며, 정보획득량은 두 확률분포의 차이를 계산하는 데에 사용하는 함수로, 어떤 이상적인 분포에 대해, 그 분포를 근사하는 다른 분포를 사용해 샘플링을 한다면 발생할 수 있는 정보 엔트로피 차이를 계산하는 함수를 의미할 수 있으며, 엔트로피를 산출하는 방법은 아래 수학식 1과 같고 정보획득량을 산출하는 방법은 아래 수학식 2와 같다.When the congestion degree of a given data set is defined as entropy with respect to the amount of information obtained, the state with the highest congestion level can be represented as 1, and the state composed of only one class can be represented as 0, This function can be used to compute the information entropy difference that can occur when sampling an ideal distribution with another distribution approximating the distribution. The method of calculating the entropy is as follows: The method of calculating the information obtaining amount as shown in Equation (1) is expressed by Equation (2) below.

[수학식 1][Equation 1]

[수학식 2]&Quot; (2) "

예컨대, 기계학습부(130)가 기계학습을 하기 위해서는 분석 시스템에서 분석되어서 나온 데이터를 사용하는데 이때 기계학습을 위한 데이터인 트레이닝 데이터(Training Data)와 트레이닝 된 의사 결정 트리(Decision Tree)에 결과를 확인하기 위한 데이터인 테스팅 데이터(Testing Data)가 필요하다.For example, in order to perform the machine learning, the machine learning unit 130 uses the data analyzed in the analysis system. In this case, the result of the training data (training data) and the training decision tree (Decision Tree) Testing Data, which is the data to check, is needed.

예컨대, 기계학습부(130)는 기존의 의사 결정 트리(Decision Tree) 모델의 불안정성을 줄이고 정확도를 늘리기 위해서 부트 스트래핑(Bootstrapping) 기법을 사용할 수 있다.For example, the machine learning unit 130 may use a bootstrapping technique to reduce the instability of the existing decision tree model and increase the accuracy.

부트 스트래핑 기법은 가설 검증을 하거나 메트릭을 계산하기 전의 랜덤 샘플링(Random Sampling)을 적용하는 방법을 의미할 수 있으며, 중복을 허용해 주어야 하는 것을 특징으로 한다.The bootstrapping technique can be a method of applying a random sampling before hypothesis verification or calculation of a metric, and it is required to allow redundancy.

부트 스트래핑 기법을 활용하는 경우, 기계학습부(130)는 랜덤 샘플링(Random Sampling)을 통하여 데이터를 추출하는 동작을 기설정된 횟수만큼 반복하여 평균을 취한 데이터를 사용할 수 있으며, 예를 들어 기온과 오존층의 관계를 부트 스트래핑한 결과는 도 6에 도시된 바와 같을 수 있으며, 도 6에서 회색선은 랜덤 샘플링된 데이터이고 빨간선이 부트 스트래핑이 적용된 데이터를 의미할 수 있으나, 본 발명은 이에 한정되지 않는다.When the bootstrapping technique is utilized, the machine learning unit 130 may use the data obtained by averaging the operation of extracting data through random sampling by a preset number of times, for example, 6, the gray line in FIG. 6 is randomly sampled data, and the red line may refer to bootstrapping data, but the present invention is not limited thereto .

예컨대, 기계학습부(130)는 날씨 데이터의 온도, 눈, 비와 교통 데이터의 차량속도의 연관관계를 의사 결정 트리(Decision Tree) 모델링으로 예측할 수 있으며, 의사 결정 트리(Decision Tree) 모델의 차량속도(0~100km/h)는 빠름(Fast), 보통(Normal), 느림(Slow)으로 구성할 수 있다.For example, the machine learning unit 130 can predict a correlation between the temperature, snow, and rain of the weather data and the vehicle speed of the traffic data by a decision tree modeling, The speed (0 to 100 km / h) can be configured as Fast, Normal, or Slow.

예컨대, 기계학습부(130)는 predict() 함수를 사용하여 테스팅 데이터(Testing Data)를 넣어 나오는 결과 테이블은 사용할 수 있으며, 예측 모델의 정확도와 테이블은 도 7에 도시된 바와 같고, 의사 결정 트리 모델은 도 8에 도시된 바와 같으며, 정확도는 32.2%로 낮은 에러율을 보임을 확인할 수 있다.For example, the machine learning unit 130 may use a result table in which the testing data is inserted by using the predict () function, and the accuracy and the table of the prediction model are as shown in FIG. 7, The model is as shown in FIG. 8, and the accuracy is 32.2%, which shows that the error rate is low.

즉, 본 발명의 실시예에 따른, 기계학습을 이용한 교통 정체 구간 추정 장치(100)는 의사 결정 트리 모델링 예측 결과를 통해서 수집한 날씨 데이터와 교통 데이터의 상관관계를 추정할 수 있다. That is, the traffic congestion section estimating apparatus 100 using the machine learning according to the embodiment of the present invention can estimate the correlation between the weather data and the traffic data collected through the decision tree modeling prediction result.

즉, 본 발명의 실시예에 따른, 기계학습을 이용한 교통 정체 구간 추정 장치(100)를 통해 사용자는 해당 도로의 날씨에 따른 예측 소통 정보를 활용할 수 있으며, 사용자는 도로 교통 정보를 예상하고 제공하는 새로운 방법의 일환으로 사용자들이 예상되는 도로 교통 정보를 통해 최적의 길을 안내할 수 있는 효과가 있다.That is, the user can utilize the predictive traffic information according to the weather of the road through the traffic congestion section estimating apparatus 100 using the machine learning according to the embodiment of the present invention, and the user predicts and provides the road traffic information As part of the new method, users can guide the optimal route through expected road traffic information.

이상에서 본 발명에 따른 바람직한 실시예에 대해 설명하였으나, 다양한 형태로 변형이 가능하며, 본 기술분야에서 통상의 지식을 가진 자라면 본 발명의 특허청구범위를 벗어남이 없이 다양한 변형예 및 수정예를 실시할 수 있을 것으로 이해된다.While the present invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but many variations and modifications may be made without departing from the scope of the present invention. It will be understood that the invention may be practiced.

100: 기계학습을 이용한 교통 정체 구간 추정 장치
110: 데이터 수집부
120: 병합 데이터 생성부
130: 기계학습부100: Traffic congestion section estimator using machine learning
110: Data collecting unit
120: Merge data generator
130: Machine learning department

Claims

A data collection unit for collecting at least one of weather data, traffic data, and road location data;
A merged data generation unit for merging the traffic data, the road position data, and the weather data to generate merged data; And
And a machine learning unit that learns the merged data based on the machine learning and generates a decision tree model for predicting a traffic congestion period for each weather.