KR20170079648A

KR20170079648A - Analysis system for predicting future risks

Info

Publication number: KR20170079648A
Application number: KR1020150190440A
Authority: KR
Inventors: 김도우; 김대곤; 김좌현; 박상진; 정재학; 이종설
Original assignee: 대한민국(국민안전처 국립재난안전연구원장)
Priority date: 2015-12-30
Filing date: 2015-12-30
Publication date: 2017-07-10
Also published as: KR101911466B1

Abstract

본 발명에 따른 미래위험 변화예측 분석 시스템은 국내외 언론사의 뉴스 및 국내외 재난 관련 학회에서 제공하는 DB를 재난관련 키워드로 필터링하는 중계서버(22), 상기 재난관련 키워드를 저장하는 제1 DB서버(21), 내부데이터 DB 및 정부기관에서 공중에 지공하는 DB를 저장하는 제2 DB서버(23)를 포함하는 데이터 수집부(20), 상기 외부데이터와 제2DB서버에 저장된 DB를 통합하는 데이터 전처리부(30), 상기 수집된 데이터를 바탕으로 텍스트 마이닝(Text mining)을 수행하여 수치화된 분석결과를 도출하는 데이터 분석부(40), 상기 데이터 분석부(40)에서 분석된 분석결과를 저장하는 분석DB서버, 상기 분석DB서버에 저장된 분석결과를 시각화하여 나타내는 표시부(60)를 포함하는 시스템인 것을 특징으로 한다.
또한 상기 데이터 분석부(40)는, 정제부(41), 분류부(42), DB저장부(43), 제1분석부(441), 제2분석부(442), 제3분석부(443), 제4분석부(444)를 포함할 수 있다. 또한, 상기 표시부(60)는 스캐닝부(61), 모니터링부(62), 비교분석부(63), 이슈추적부(64) 및 논문검색부(65)를 포함할 수 있다.The future risk change prediction analysis system according to the present invention includes a relay server 22 for filtering news provided by domestic and overseas news agencies and disaster related research institutes related to domestic and international disasters by disaster related keywords, a first DB server 21 A data collecting unit 20 including an internal data DB and a second DB server 23 for storing a publicly available DB from a government agency, a data preprocessing unit 20 for integrating the external data with a DB stored in the second DB server, A data analysis unit 40 for extracting numerical analysis results by performing text mining on the basis of the collected data, an analyzing unit 40 for analyzing the analyzed result in the data analyzing unit 40, A DB server, and a display unit 60 for visualizing and displaying the analysis results stored in the analysis DB server.
The data analysis unit 40 includes a purification unit 41, a classification unit 42, a DB storage unit 43, a first analysis unit 441, a second analysis unit 442, 443, and a fourth analyzing unit 444. The display unit 60 may include a scanning unit 61, a monitoring unit 62, a comparison and analysis unit 63, an issue tracking unit 64, and a thesis searching unit 65.

Description

{Analysis system for predicting future risks}

본 발명은 데이터 수집부에 수집된 빅데이터를 바탕으로 하는 미래위험의 변화예측을 위한 분석 시스템 및 그 분석 방법에 관한 것으로, 더욱 상세하게는 데이터 수집부에 수집된 데이터를 바탕으로 재난 관련 키워드로 텍스트마이닝 후 이를 그룹화 및 카테고리화하여 분석하고 시각화하여 표시하는 미래위험 변화예측 분석 시스템 및 이를 이용한 미래위험 변화예측 분석방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to an analysis system and an analysis method for predicting a future risk change based on big data collected in a data collection unit, and more particularly, The present invention relates to a future risk change prediction analysis system for grouping and categorizing, analyzing, visualizing and displaying texts after mining thereof, and a future risk change prediction analysis method using the same.

최근 다원화되고 복잡해진 현대사회는 기후 온난화 및 그에 의한 기상 이변의 빈번한 발생, 석유, 석탄 등의 화석 연료나 전자제품에 필수적인 희토류 등 천연 자원의 고갈 또는 이러한 고갈 여부에 의한 자원 가격의 급등락, 2008년 미국의 리만 브라더스 사태와 같은 금융위기, 메르스 사태에서 드러나는 고위험 전염성 질병의 발생과 전파, 정치적, 종교적 갈등에 의한 테러위협 증가 및 전쟁발생 등과 같은 중장기 위험요인이 증가함에 따라 과거, 현재의 위험요인으로부터 미래의 위험요소를 과학적, 통계적으로 탐색하고 분석해야 할 필요성이 커지고 있다. 이에 세계 각국은 미래사회 도전과 위험요소에 대한 선제적 정책 대응을 위해 미래이슈분석(horizon scanning) 활동을 국가 차원으로 격상하여 수행 중에 있다. 이러한 전세계적인 추세에 따라 과거, 현재의 위험요인들을 통해 사회 위험 요소를 탐색하고 과학적, 통계적 방법을 통해 글로벌 환경을 탐색할 필요가 대두되고 있는 실정이다.In modern society, which has recently become multifaceted and complex, natural resources such as depletion of natural resources such as fossil fuels such as petroleum, coal, and rare earths such as coal, or fluctuations in resource prices due to such depletion, As the mid- to long-term risk factors such as the financial crises such as the Lehman Brothers in the US, the emergence and propagation of high-risk infectious diseases emerged from the Mels situation, the threat of terrorism due to political and religious conflicts, There is a growing need to scientifically and statistically explore and analyze future risk factors. As a result, countries around the world are carrying out horizon scanning activities at the national level in order to respond to future social challenges and preemptive policy responses to risk factors. In this global trend, there is a need to explore social risks through past and present risk factors and explore global environment through scientific and statistical methods.

상기와 같은 현재의 수많은 위험요소들 중, 재난 및 사고 등의 위험요소는 현대사회에서 가장 파괴력있고 영향력있는 요소라고 할 것이다. 이를 예측하고 대비하기 위하여, 최근 전자통신분야의 급격한 발달과 더불어 SNS, 개인의 검색 기록 등을 익명으로 수집한 빅데이터를 활용하는 방안이 떠오르고 있다.Of these many current risk factors, disaster and accident risk factors are the most destructive and influential elements in modern society. In order to anticipate and prepare for this, it is becoming more and more common to utilize Big Data, which collects anonymously the SNS and personal search history, along with the recent rapid development of the electronic communication field.

특히 한국의 스마트폰 보급률은 83%로 세계 4위(2015년 3월 기준)이고, 이를 통해 쌓인 빅데이터를 기반으로 재난에 대한 피해 조사를 하거나(아래 특허문헌 2 참조) 재난이 발생하는 경우 개인에게 경고를 발송하는 등의 시도(아래 특허문헌 1 참조)는 그동안 존재하여 왔고 실생활 속 재난 대비 및 안전 관리에 많은 도움을 주고 있다.In particular, the penetration rate of smartphones in Korea is 83%, which is the 4th largest in the world (as of March 2015), and based on the accumulated big data, the damage investigation on the disaster (refer to patent document 2 below) (See patent document 1 below) have been in existence for a long time and have been very helpful for real-life disaster preparedness and safety management.

하지만 이러한 빅데이터를 활용한 기술들은 언제나 개인 프라이버시(Privacy)와 관련된 이슈들이 제기되며, 대기업 또는 국가가 빅브라더(Big brother)가 되어 개인을 감시할 수 있다는 점에서 논란이 되어왔다. 그리고 다수의 개인에 대한 통계적인 데이터를 바탕에 두기 때문에 전문적인 지식을 통한 미래예측이나, 신뢰성있고 체계적인 미래이슈 분석에는 그 한계가 존재하여 왔다.However, technologies that utilize these Big Data have always been controversial in that issues related to privacy are raised, and large companies or countries become Big Brothers to monitor individuals. Since there are statistical data based on many individuals, there are limits to future prediction through professional knowledge and reliable and systematic future issue analysis.

대한민국 특허공개공보 제10-2015-0045771호(2015년 4월 29일 공고, 발명의 명칭 "통합적인 재난관리를 위한 스마트 재난관리 시스템")Korean Patent Publication No. 10-2015-0045771 (issued on April 29, 2015, entitled "Smart Disaster Management System for Integrated Disaster Management") 대한민국 특허공개공보 제10-2014-0032205호(2014년 3월 14일 공고, 발명의 명칭 "모바일 기반 재난피해 조사시스템 및 방법")Korean Patent Laid-Open Publication No. 10-2014-0032205 (Mar. 14, 2014, entitled " Mobile Based Disaster Damage Investigation System and Method ")

상기와 같은 문제점을 해결하고자 본 발명은 국내외 뉴스와 국내외 주요 재난 관련 학회의 논문 등의 외부 데이터와 국립재난안전연구원 내부보고서 DB 및 기타 정부기관 DB로부터 텍스트마이닝(text mining)을 통하여 데이터를 수집하는 것을 일 목적으로 하고 있다.In order to solve the above-mentioned problems, the present invention collects data through text mining from external data such as domestic and foreign news, papers of domestic and overseas major disaster-related academic societies, and the National Disaster Safety Institute internal report database and other government database It is for the purpose of things.

또한 본 발명은 상기 수집된 데이터를 재난관련 키워드로 필터링하고 문서 분류(document classification) 및 문서 군집(document clustering)을 수행하여 표시부에 나타냄으로써 신뢰성있고 체계적인 미래위험분석시스템을 제공하여 사전에 미래위험을 예측하고 합리적인 대응 시나리오를 발굴하는 것을 다른 일 목적으로 하고 있다.The present invention also provides a reliable and systematic future risk analysis system by filtering the collected data by disaster related keywords, performing document classification and document clustering on the display unit, The purpose of the project is to find a reasonable and predictable scenario for response.

또한 본 발명은 재난 카테고리별로 시각화한 재난 관련 트렌드를 표시부에 나타냄으로써 사용자가 최근 재난 트렌드의 변화와 경향을 한눈에 볼 수 있도록 하는 것을 또 다른 일 목적으로 하고 있다.Another object of the present invention is to allow a user to view recent trends and trends of disaster trends at a glance by displaying disaster related trend visualized by disaster category on the display unit.

본 발명이 해결하고자 하는 과제는 이상에서 언급한 과제들로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 이하의 기재들로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the above-mentioned problems, and other problems not mentioned can be clearly understood by those skilled in the art from the following description.

상기한 목적을 달성하기 위한 본 발명인 미래위험 변화예측 분석 시스템은 빅데이터를 재난관련 키워드로 필터링하는 중계서버(22), 상기 재난관련 키워드를 저장하는 제1 DB서버(21)를 포함하는 데이터 수집부(20), 상기 수집된 데이터를 바탕으로 텍스트 마이닝(Text mining)을 수행하여 수치화된 분석결과를 도출하는 데이터 분석부(40), 상기 데이터 분석부(40)에서 분석된 분석결과를 저장하는 분석DB(50), 상기 저장된 분석결과를 시각화하여 나타내는 표시부(60)를 포함하되, 상기 빅데이터는 국내외 언론사의 뉴스 및 국내외 재난 관련 학회에서 제공하는 DB인 것을 특징으로 할 수 있다.In order to achieve the above object, the future risk change prediction analysis system according to the present invention includes a relay server (22) for filtering big data with disaster related keywords, and a first DB server (21) A data analyzing unit 40 for extracting numerical analysis results by performing text mining on the basis of the collected data, an analyzing unit 40 for analyzing the data analyzed by the data analyzing unit 40, An analysis DB 50, and a display unit 60 for visualizing the stored analysis results, wherein the big data is a DB provided by domestic and overseas media news and international and domestic disaster-related academic societies.

또한, 상기 데이터 수집부(20)는 내부데이터 DB 및 정부기관에서 공중에 제공하는 DB를 저장하는 제2 DB서버(23)를 더 포함할 수 있다.The data collection unit 20 may further include a second DB server 23 for storing an internal data DB and a DB provided to the public by a government agency.

또한 상기 내부데이터 DB 및 정부기관에서 공중에 제공하는 DB에 대하여 ETL(Extraction, Transformation, Loading)을 수행한 후 상기 중계서버(22)에서 필터링된 외부 데이터와 통합하여 통합데이터를 생성하는 데이터 전처리부(30)를 더 포함할 수 있다.In addition, a data preprocessing unit for performing ETL (Extraction, Transformation, Loading) on the DB provided to the public by the internal data DB and the government agency and then integrating the filtered data with the external data filtered by the relay server 22, (30).

또한, 상기 중계서버(22)는 외부 뉴스로부터 재난관련 키워드로 필터링하여 데이터를 수집하는 수집어댑터와 상기 필터링된 데이터를 저장하는 외부파일서버로 구성될 수 있다. Also, the relay server 22 may include a collection adapter for collecting data by filtering from disaster-related keywords from external news, and an external file server for storing the filtered data.

상기 재난 관련 키워드는 호우, 태풍, 홍수, 강풍, 황사, 풍랑, 산사태, 폭염, 한파, 해일, 지진, 가뭄, 대설, 낙뢰, 우박, 화산폭발, 우주재해, 조류, 가축질병, 금융전산, 전염병, 전파재난, 정보통신, 화생방사고, 수도, 교통, 에너지, 보건의료, 교통사고, 폭발, 테러, 전쟁, 화재, 해양 환경오염사고, 수질 환경오염사고, 항공사고, 해상사고, 원전사고, 국가기반시설, 붕괴 등의 키워드 및 해당 키워드의 유의어 및 동의어를 포함할 수 있다.The disaster-related keywords include, but are not limited to, heavy rain storms, typhoons, floods, strong winds, yellow dust storms, landslides, heat waves, cold waves, tsunamis, earthquakes, droughts, heavy snow, lightning, hail, volcanic eruptions, , Radio wave disaster, information communication, NBC accident, water, traffic, energy, health care, traffic accident, explosion, terrorism, war, fire, marine environment pollution accident, water quality environmental pollution accident, air accident, Infrastructure, collapse, etc., and synonyms and synonyms of the keywords.

상기 데이터 분석부(40)는, 상기 데이터 수집부(20)에서 필터링된 외부데이터 또는 상기 데이터 전처리부(30)의 통합데이터에 대하여 자연어 처리(NLP, natural language processing)를 수행하는 정제부(41);와, 상기 정제된 데이터들을 카테고리별로 문서 분류(document classification)를 수행하기 위한 분류부(42);와, 상기 분류를 위한 재난관련 카테고리에 대한 데이터를 저장하고 있는 DB저장부(43);와, 상기 카테고리별로 분류된 문서에 대하여 문서 군집(document clustering)을 수행하는 제1분석부(441);와, 상기 제1분석부(441)에서 군집된 문서의 정보를 추출하는 제2분석부(442);를 포함할 수 있다. The data analyzing unit 40 includes a purifying unit 41 for performing natural language processing (NLP) on the external data filtered by the data collecting unit 20 or the integrated data of the data preprocessing unit 30, A classification unit 42 for classifying the refined data according to a category, a DB storage unit 43 for storing data on disaster related categories for the classification, A first analyzing unit 441 for performing document clustering on a document classified by the category, a second analyzing unit 441 for extracting information of the documents collected by the first analyzing unit 441, (442) < / RTI >

또한, 상기 제2분석부(442)에서 추출된 정보를 분석하여 전년대비 증가추이를 기준으로 재난 유형별로 수치화하는 제3분석부(443);를 더 포함할 수 있다.The third analyzer 443 analyzes the information extracted by the second analyzer 442 and digitizes the information according to the disaster type based on the year-on-year increase trend.

또한, 상기 제3분석부(443)에서 수치화된 분석결과를 가지고 Sorensen-Dice coefficient 알고리즘을 활용하여 특정 이슈의 확산 경로를 추척하는 제4분석부(444);를 포함할 수 있다.In addition, the fourth analyzer 444 may analyze the spread path of the specific issue using the Sorensen-Dice coefficient algorithm with the numerical analysis result in the third analyzer 443.

또한, 상기 분류부(42)는 상기 정제된 데이터들의 출처를 기준으로 뉴스와 논문으로 분류하고, 재난관련 카테고리에 대한 데이터를 기준으로 동의어, 유의어 등을 함께 고려하여 세분화된 문서 분류(document classification)를 수행할 수 있다. In addition, the classification unit 42 classifies the classified data into news and articles based on the source of the refined data, classifies document classification by considering synonyms, synonyms, Can be performed.

상기 재난관련 카테고리에 대한 데이터는, 재난유형, 사회환경, 피해속성을 카테고리로 포함하되, 상기 재난유형에 대한 카테고리는 "호우, 태풍, 홍수, 강풍, 황사, 풍랑, 산사태, 폭염, 한파, 해일, 지진, 가뭄, 대설, 낙뢰, 우박, 화산폭발, 우주재해, 조류, 가축질병, 금융전산, 전염병, 전파재난, 정보통신, 화생방사고, 수도, 교통, 에너지, 보건의료, 교통사고, 폭발, 테러, 전쟁, 화재, 해양 환경오염사고, 수질 환경오염사고, 항공사고, 해상사고, 원전사고, 국가기반시설, 붕괴 및 그 유의어 및 동의어"로 세분화될 수 있고, 상기 사회환경에 대한 카테고리는 "농업, 어업, 임업, 축산업, 에너지, 교통, 보건.위생, 수자원, 치안 및 그 유의어 및 동의어로 세분화"될 수 있고, 상기 피해속성에 대한 카테고리는 "가축피해, 인명피해, 재산피해, 시설피해 및 그 유의어 및 동의어"로 세분화될 수 있다.The data for the disaster related category includes categories of disaster type, social environment, and damage property, and the categories for the disaster type are "storm, typhoon, flood, strong wind, yellow sand, storm, landslide, , Earthquake, drought, heavy snow, lightning, hail, volcanic eruption, space disaster, bird, livestock disease, financial computing, epidemic, radio disaster, The categories of the social environment can be classified into "terrorism, war, fire, marine environmental pollution accident, water quality environmental pollution accident, air accident, marine accidents, nuclear accident, national infrastructure, collapse and its synonyms and synonyms" The category of damage property can be classified into "agriculture, fishery, forestry, livestock industry, energy, traffic, health and sanitation, water resources, security and its synonyms and synonyms" And its oil Quot; and "synonyms ".

또한, 상기 표시부(60)는 상기 분석DB에 저장된 분석결과에 대하여 ETL(Extraction, Transformation, Loading)을 수행한 후 시각화하여 나타내되, 상기 분류부(42)에서 재난관련 카테고리별 누적 뉴스량 또는 재난관련 총 누적 뉴스량 등의 숫자 통계량을 시각화한 스캐닝부(61);와, 상기 제1분석부(441)에서 문서 군집(document clustering)이 수행된 문서 그룹, 상기 제2분석부(442)에서 추출된 핵심키워드들을 시각화한 모니터링부(62);와, 상기 제3분석부(443)에서 변환된 정량데이터를 주제별, 시기별로 비교할 수 있도록 한 화면에 시각화한 비교분석부(63);와, 상기 제4분석부에서 수치화된 이슈 강도를 시기별로 시각화한 이슈추적부(64); 및 상기 데이터 수집부(20) 또는 상기 데이터 전처리부(30)의 수집된 데이터나 통합데이터의 논문을 직접 검색할 수 있도록 입력부를 구비하고 검색결과를 시각화하는 논문검색부(65);를 포함할 수 있다.In addition, the display unit 60 displays ETL (Extraction, Transformation, Loading) of the analysis result stored in the analysis DB and visualizes the result of the analysis. In the classification unit 42, A second analyzing unit 442 for analyzing a document group in which the document clustering is performed in the first analyzing unit 441, A comparator 63 for visualizing the quantitative data converted by the third analyzer 443 on a screen so that they can be compared on a topic or a time basis, An issue tracking unit 64 that visualizes the issue strengths digitized by the fourth analyzing unit in time series; And a thesis searching unit (65) having an input unit for directly searching the collected data of the data collecting unit (20) or the data preprocessing unit (30) or theses of the integrated data and visualizing the search result .

구체적으로, 상기 스캐닝부(61)는 자연재난, 사회재난, 사회환경 및 피해속성의 카테고리별 누적 뉴스량과 총 누적뉴스량, 및 통계량이 표시된 통계부(100)와, 전국의 각 지역과 표시하고자 하는 월(月)을 지정할 수 있는 입력부(200)와, 상기 입력부(200)에 입력된 정보에 맞춰서 자연재난 중 홍수, 태풍, 강풍, 호우, 가뭄 등으로 카테고리화된 분석DB의 각 데이터 비율, 사회환경 중 교통, 보건·위생, 에너지, 수자원, 농업 등으로 카테고리화된 분석DB의 각 데이터 비율, 사회재난 중 교통사고, 보건의료, 정보통신, 해양선박사고, 금융전산 등으로 카테고리화된 분석DB의 각 데이터 비율, 피해속성 중 인명피해, 재산피해, 시설피해, 가축피해 등을 카테고리화된 분석DB의 각 데이터 비율을 도식화해서 보여주고 이에 대한 수치를 다운로드할 수 있게 표시하는 도입부(300)와, 상기 입력부에 입력된 정보에 맞춰서 날짜별 재난 트렌드를 도식화된 그래프로 나타내고 자연재난, 사회환경, 사회재난, 피해속성으로 분류한 주요 토픽을 도식화된 그래프에 함께 병기하는 것을 특징으로 하는 트렌드부(400)와, 상기 입력부에 입력된 정보에 맞춰서 분석DB에 입력된 데이터를 기반으로 한 주요 키워드를 사용 빈도 순위에 따라 색깔을 달리하여 나타낸 키워드부(500)를 포함할 수 있다.Specifically, the scanning unit 61 includes a statistics unit 100 that displays cumulative news volume, total cumulative news volume, and statistical volume for each category of natural disaster, social disaster, social environment, and damage attributes, A data rate of an analysis DB categorized by natural disaster such as flood, typhoon, strong wind, heavy rain, drought, etc. in accordance with the information input to the input unit 200; , Categorized by data rate of analytical DB categorized as social, transportation, health, hygiene, energy, water, and agriculture, traffic accident during social disaster, health care, information communication, marine vessel accident, A graphical representation of each data ratio of the analytical DB categorized as the ratio of each data in the analysis DB, the damage property, the property damage, the facility damage, the animal damage, etc., And a main graphical representation of disaster trends by date in accordance with the information input to the input unit and a main topic classified into natural disaster, social environment, social disaster, And a keyword unit 500 for displaying a main keyword based on data input to the analysis DB in accordance with the information input to the input unit by using different colors according to the frequency of use .

또한 상기 모니터링부(62)는 전국 지역과 표시하고자 하는 자연재난, 사회재난, 사회환경, 피해속성의 종류 및 표시할 기준일, 표시하고자 하는 월(月) 및 검색하고자 하는 검색어를 입력할 수 있는 입력부(110)와, 상기 입력부에 입력된 정보에 맞춰서 재난 트렌드를 도식화된 그래프로 나타내고 자연재난, 사회환경, 사회재난, 피해속성으로 분류한 주요 토픽을 도식화된 그래프에 함께 병기하는 것을 특징으로 하는 트렌드부(210)와, 상기 입력부에 입력된 정보에 맞춰서 뉴스 데이터와 관련 학회 논문 데이터가 그룹화된 재난 토픽 및 뉴스가 그 토픽과 공통 키워드로 표시되고 상기 그룹화된 내용을 다운로드 및 전문을 볼 수 있도록 구성한 것을 특징으로 하는 재난 토픽 및 뉴스부(310)와, 상기 입력부에 입력된 정보에 맞춰서 연관어를 관련도에 따라 중심 키워드와 관련 키워드로 나누어 관련도가 높을수록 중심 키워드와 가까이 배치되도록 표시하여 연관어 현황을 한눈에 볼 수 있도록 나타내는 제1 연관어 현황부(410)를 포함할 수 있다.In addition, the monitoring unit 62 may include an input unit that can input a national area, a natural disaster to be displayed, a social disaster, a social environment, a type of damage attribute, a reference date, a month to be displayed, (110), and a trend in which disaster trends are displayed in a graphical diagram in accordance with information input to the input unit, and major topics classified into natural disaster, social environment, social disaster, A disaster topic and news in which news data and related conference data are grouped according to the information input to the input unit and the news are displayed as the topic and the common keyword and the grouped contents are downloaded and can be viewed And a news section 310. The news section 310 includes a news section 310 and a news section 310. According to the information inputted to the input section, The higher the share relevance with the keyword and the related keywords may include a first associated word into 410 indicating to view the display in association air into so as to be disposed close to the center of keywords at a glance.

또한, 상기 비교분석부(63)는 전국 지역과 표시하고자 하는 자연재난, 사회재난, 사회환경, 피해속성의 종류 및 표시할 기준일, 표시하고자 하는 월(月) 및 검색하고자 하는 검색어를 입력할 수 있는 다수개의 입력부(120)와, 상기 입력부(120)에 입력된 정보에 맞춰서 자연재난, 사회재난, 사회환경의 재난 트렌드를 한 차트에 표시하여 한눈에 비교할 수 있도록 나타낸 비교부(220, 320)를 다수 표시할 수 있다.In addition, the comparative analysis unit 63 can input a search term to be searched and a month to be displayed and a search term to be searched based on the type of the natural disaster, the social disaster, the social environment, A comparison unit 220 or 320 which displays disaster trends of a natural disaster, a social disaster and a social environment on a chart according to the information input to the input unit 120 so that the trends can be compared at a glance, Can be displayed.

또한, 상기 이슈추적부(64)는 전국 지역과 표시하고자 하는 자연재난, 사회재난, 사회환경, 피해속성의 종류 및 표시할 기준일, 표시하고자 하는 월(月) 및 검색하고자 하는 검색어를 입력할 수 있는 입력부(130)와, 상기 입력부에 입력된 정보에 맞춰서 이슈발생일을 가로축으로, 이슈 강도를 세로축으로 도표화하여 특정 이슈가 확산된 범위를 시각화한 이슈 확산 형태 추적부(230)와, 상기 입력부에 입력된 정보에 맞춰서 특정 이슈에 관한 뉴스들의 리스트를 표시하는 이슈 뉴스부(330)와, 상기 입력부에 입력된 정보에 맞춰서 연관어를 관련도에 따라 중심 키워드와 관련 키워드로 나누어 관련도가 높을수록 중심 키워드와 가까이 배치되도록 표시하여 연관어 현황을 한눈에 볼 수 있도록 나타내는 제1 연관어 현황부(430)를 포함할 수 있다.In addition, the issue tracking unit 64 can input a search term to be searched and a month to be displayed and a search term to be searched based on the type of the natural disaster, the social disaster, the social environment, An issue diffusion form tracking unit 230 for visualizing the issue occurrence date on a horizontal axis, the issue intensity on a vertical axis to visualize a range in which a specific issue is diffused in accordance with the information input to the input unit, An issue news section 330 for displaying a list of news related to a specific issue in accordance with the information input to the input section, and a related keyword, which is related to the information input to the input section, And a first related word status unit 430 for displaying the associated word status at a glance.

또한, 상기 모니터링부의 재난 토픽 및 뉴스부 또는 이슈추적부의 이슈 뉴스부 중 어느 하나에 게시된 재난관련 뉴스에 대하여 입력장치를 통해 명령을 입력하면, 상기 뉴스에 관련되어 상기 제1분석부에서 문서 군집(document clustering)이 수행된 관련성있는 뉴스 기사들의 목록이 새로운 창에 표시될 수 있다.In addition, when a command is inputted through the input device for the disaster news related to the disaster news of the monitoring section and the news news section of the news section or the issue news section of the issue tracking section, a list of relevant news articles on which document clustering has been performed may be displayed in a new window.

또한, 상기 모니터링부 또는 이슈추적부 중 어느 하나의 제1 연관어 현황부에 있어서, 상기 연관어 현황부 중 하나의 키워드에 대하여 입력장치를 통해 명령을 입력하면, 상기 키워드만을 중심으로 관련도가 높은 연관어 현황을 한눈에 볼 수 있도록 나타낸 제2 연관어 현황부를 더 포함할 수 있다.In the first related word status part of the monitoring unit or the issue tracker, if a command is input to one of the keyword units in the input unit through the input unit, And a second association status unit for displaying the status of high associativity at a glance.

상기와 같이 구성되는 본 발명에 따른 미래위험 분석 시스템은, 다수의 개인에 대한 통계적인 데이터를 바탕으로 하는 종래 시스템과는 달리 개인 프라이버시(Privacy)를 침해하지 않고서도, 더욱 전문적이고 체계적인 미래위험 분석방법을 제공할 수 있다.The future risk analysis system according to the present invention configured as described above can provide a more professional and systematic future risk analysis without infringing on privacy, unlike the conventional system based on statistical data on a plurality of individuals. Method can be provided.

또한, 본 발명에 따른 미래위험 분석 시스템은 구체적인 재난관련 키워드 및 재난관련 카테고리 데이터를 제공하여 미래위험 분석을 위한 텍스트 마이닝을 용이하게 수행할 수 있는 수단을 제공할 수 있다.Further, the future risk analysis system according to the present invention can provide a means for easily performing text mining for future risk analysis by providing specific disaster related keyword and disaster related category data.

또한, 본 발명에 따른 미래위험 분석 시스템은, 본 발명의 이용자가 원하는 데이터를 검색하기 용이한 시스템을 제공하고, 검색결과를 직관적으로 시각화하여 미래위험을 용이하게 분석, 파악하는 수단을 제공할 수 있다.Further, the future risk analysis system according to the present invention can provide a system in which users of the present invention can easily retrieve desired data, and can provide means for easily analyzing and understanding future risks by intuitively visualizing search results have.

도 1은 본 발명의 제1실시예의 구성을 개략적으로 나타낸 도면이다.
도 2는 본 발명의 제2실시예의 구성을 개략적으로 나타낸 도면이다.
도 3은 본 발명의 제3실시예에서 데이터 분석부(40)의 구성을 개략적으로 나타낸 도면이다.
도 4는 본 발명의 제4실시예에서 표시부(60)의 구성을 개략적으로 나타낸 도면이다.
도 5는 본 발명의 제1실시예 내지 제4실시예 전체 구성을 통합한 제5실시예의 구성을 개략적으로 나타낸 도면이다.
도 6은 표시부(60)에서 스캐닝부(61)를 예시한 도면이다.
도 7은 표시부(60)에서 모니터링부(62)를 예시한 도면이다.
도 8은 표시부(60)에서 비교분석부(63)를 예시한 도면이다.
도 9는 표시부(60)에서 이슈추적부(64)를 예시한 도면이다.
도 10은 표시부(60)에서 논문검색부(65)를 예시한 도면이다.
도 11은 표시부(60)에서 글로벌이슈를 예시한 도면이다.
도 12는 표시부(60)에서 재난 관련 뉴스를 선택하는 경우 표시되는 해당 토픽과 연관된 뉴스 리스트를 예시한 도면이다.
도 13은 표시부(60)의 연관어 현황에서 키워드를 선택하는 경우 해당 키워드에 대한 제2 연관 검색어가 표시되는 것을 예시한 도면이다.
도 14는 본 발명의 제14실시예에 따라 구현된 시스템 전체를 예시한 도면이다.1 is a view schematically showing a configuration of a first embodiment of the present invention.
2 is a view schematically showing a configuration of a second embodiment of the present invention.
3 is a diagram schematically showing the configuration of the data analysis unit 40 in the third embodiment of the present invention.
4 is a view schematically showing the configuration of the display unit 60 in the fourth embodiment of the present invention.
FIG. 5 is a view schematically showing the configuration of a fifth embodiment incorporating the entire configurations of the first through fourth embodiments of the present invention.
Fig. 6 is a diagram illustrating the scanning unit 61 in the display unit 60. Fig.
FIG. 7 is a diagram illustrating the monitoring unit 62 in the display unit 60. FIG.
Fig. 8 is a diagram illustrating the comparative analysis unit 63 in the display unit 60. Fig.
FIG. 9 is a diagram illustrating an issue tracking unit 64 in the display unit 60. FIG.
10 is a diagram illustrating a thesis searching unit 65 in the display unit 60. As shown in Fig.
11 is a diagram illustrating a global issue in the display unit 60. FIG.
12 is a diagram illustrating a news list associated with a corresponding topic displayed when the disaster-related news is selected in the display unit 60. In FIG.
13 is a diagram illustrating a display of a second associated search term for a keyword when a keyword is selected in the associated word status of the display unit 60. [
14 is a diagram illustrating an entire system implemented according to a fourteenth embodiment of the present invention.

이하, 본 발명에 따른 바람직한 실시예를 도면을 참조하여 상세히 설명한다.Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the drawings.

본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 명칭에는 동일 부호를 사용하기로 한다. 또한, 본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며, 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 경우에 따라 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외의 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지않는 한 이상적으로 또는 과도하게 해석되지 않는다.It is to be understood that the present invention is not limited to the disclosed embodiments, but may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. Throughout the specification, the same reference numerals are used for the same names. Furthermore, terms used herein are for the purpose of illustrating embodiments and are not intended to limit the present invention. In this specification, the singular forms include plural forms as the case may be, unless the context clearly indicates otherwise. &Quot; comprises "and / or" comprising "used in the specification do not exclude the presence or addition of one or more other elements other than the stated element. Unless defined otherwise, all terms used herein may be used in a sense commonly understood by one of ordinary skill in the art to which this invention belongs. In addition, commonly used predefined terms are not ideally or excessively interpreted unless explicitly defined otherwise.

본 발명의 기타 이점 및 특징, 그리고 이들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다.Other advantages and features of the present invention and methods for accomplishing the same will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예에 따른, 미래위험 변화예측 분석 시스템에 관하여 상세히 설명하면 다음과 같다.Hereinafter, a future risk change prediction analysis system according to a preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 제1실시예에 따른 미래위험 변화예측의 분석을 위한 시스템의 구성을 도시한 도면이다.1 is a diagram illustrating a system configuration for analyzing a future risk change prediction according to a first embodiment of the present invention.

본 발명의 제1실시예에 따른 미래위험 변화예측의 분석을 위한 시스템은, 빅데이터를 재난관련 키워드로 필터링하는 중계서버(22), 상기 재난관련 키워드를 저장하는 제1 DB서버(21)를 포함하는 데이터 수집부(20), 상기 수집된 데이터를 바탕으로 텍스트 마이닝(Text mining)을 수행하여 수치화된 분석결과를 도출하는 데이터 분석부(40), 상기 데이터 분석부(40)에서 분석된 분석결과를 저장하는 분석DB(50), 상기 분석DB서버에 저장된 분석결과를 시각화하여 나타내는 표시부(60)를 포함하되, 상기 빅데이터는 '국내외 언론사의 뉴스기사 및 국내외 재난 관련 학회에서 제공하는 DB'(이하 '외부데이터'라고 함)를 포함할 수 있다.The system for analyzing future risk change prediction according to the first embodiment of the present invention includes a relay server 22 for filtering big data with disaster related keywords and a first DB server 21 for storing the disaster related keywords A data analyzer 40 for extracting numerical analysis results by performing text mining on the basis of the collected data, a data analyzer 40 for analyzing the data analyzed by the data analyzer 40, And a display unit 60 for visualizing analysis results stored in the analysis DB server. The big data includes 'news articles from domestic and foreign media companies, and DBs provided by domestic and overseas disaster related institutes' (Hereinafter referred to as " external data ").

이때 빅데이터란 디지털 환경에서 생성되는 데이터로 그 규모가 방대하고, 생성 주기도 짧으며, 수치 데이터뿐 아니라 문자와 영상 데이터를 포함하는 대규모 데이터를 말한다. 과거에 비해 데이터의 양이 폭증했다는 점과 함께 데이터의 종류도 다양해진 점 등으로 인하여, 빅데이터 분석을 통해 사람들의 행동은 물론 위치정보와 SNS를 통해 생각과 의견까지 예측할 수 있다.At this time, big data is data generated in a digital environment, which is large in scale, short in generation period, and refers to large-scale data including not only numerical data but also text and image data. Due to the fact that the amount of data has increased compared with the past and the kinds of data have been diversified, it is possible to predict thoughts and opinions through SNS as well as people's behavior through big data analysis.

한편, 본 발명의 빅데이터로서 이용되는 국내외 뉴스기사는, 네이버 등의 포털사이트에서 검색가능한 126개 언론사의 2004년 1월부터 현재까지의 총 1억여 건의 뉴스기사를 대상으로 할 수도 있다. 또한, 국내 뿐만 아니라 해외 언론사의 뉴스기사를 대상으로 할 수 있으나 이에 한정되는 것은 아니다. 상기 해외 언론사를 선정할 때, alexa.com 등의 국가별 언론매체 순위를 참고하여 데이터 수집원의 신뢰도를 증가시킬 수도 있다.On the other hand, domestic and foreign news articles used as big data of the present invention may be targeted to a total of about 100 million news articles of 126 media companies searchable on a portal site such as Naver from January 2004 to the present. In addition, news articles from overseas media companies as well as domestic news media may be targeted, but the present invention is not limited thereto. When selecting the foreign media companies, it is possible to increase the reliability of the data collection source by referring to the ranking of media in each country such as alexa.com.

또한, 상기 국내외 재난 관련 학회에는 한국 방재학회, 한국지진공학회, 대한토목학회, 한국수자원학회, 한국행정연구원, 한국정보화진흥원, 국회예산정책처, 한국행정학회, 한국사학회, 한국정치학회, 한국환경보건학회, 한국기상학회, 한국대기환경학회, 한국정보기술학회 등의 14개 학회에서 발표된 10만여 건의 논문을 대상으로 할 수도 있으나 이에 한정되는 것은 아니다.The disaster-related conferences include the Korean Disaster Society, the Korean Earthquake Engineering Society, the Korea Civil Engineering Society, the Korea Water Resources Society, the Korea Institute of Public Administration, the Korea Information Society Agency, the National Assembly Budget Office, the Korean Public Administration Association, But it is not limited to the 100,000 papers published by 14 academic societies including the Korean Society of Public Health, the Korean Meteorological Society, the Korean Society for Atmospheric Environment and the Korea Information Technology Society.

이때 상기 외부데이터는 별도의 외부데이터 수집원(10)에 미리 DB로 저장되어 있을 수 있으며, 이때 외부데이터 수집원(10)은 중계서버(22) 등의 본 발명에서 사용되는 서버를 운영하는 주체와 동일한 주체가 운영할 수도 있고, 이외의 제3자가 운영할 수도 있음을 알아야 한다.At this time, the external data may be previously stored in a separate external data collection source 10 as a DB. In this case, the external data collector 10 may be a main body of the relay server 22, And may be operated by a third party other than the third party.

상기 외부데이터 등을 별도의 외부데이터 수집원(10)에 저장하기 위해 Hadoop 뿐만 아니라 Flume, sqoop, R, HBase, Oozie 등 여러 가지 기술을 사용할 수 있으며, 이때, 상기 Hadoop란, 저가 서버와 하드디스크를 이용하여 빅데이터(big data)를 상대적으로 쉽게 활용, 처리할 수 있는 분산파일 시스템을 말하며, 여러 개의 저렴한 컴퓨터를 마치 하나인 것처럼 묶어 대용량 데이터를 처리하는 기술을 말한다. Various techniques such as Flume, sqoop, R, HBase, and Oozie can be used as well as Hadoop in order to store the external data or the like in a separate external data collector 10. Here, Refers to a distributed file system that can use and process big data relatively easily. It refers to a technology that processes large amounts of data by grouping several inexpensive computers as if they were one.

이렇게 외부데이터 수집원(10)에 저장된 데이터에 저장된 외부데이터 등을 재난관련 키워드로 필터링하기 위한 방법으로 Solr 또는 Elastic Search 등의 검색엔진을 사용할 수 있다. 이때, 상기 Solr란, 검색과 관련된 기본 프레임워크를 제공하는 아파치 루씬(APACHE LUCENE)을 기반으로 만들어진 전문 검색 엔진으로서 기업용 검색엔진의 오픈소스로서 자바언어를 사용하는 것을 특징으로 한다. 또한, 상기 Elastic Search란 Solr와 마찬가지로 아파치 루씬(APACHE LUCENE)을 기반으로 개발된 오픈소스 분산 검색 엔진으로서, 분산처리와 실시간 처리 능력이 뛰어나다는 장점이 있다. 또한, 상기 외부데이터 수집원(10)에 저장된 데이터 중 list page나 본문 추출이 되지 않는 garbage 문서를 수작업으로 제거하는 과정을 더 포함할 수도 있다. A search engine such as Solr or Elastic Search can be used as a method for filtering external data stored in the data stored in the external data collection source 10 by disaster related keywords. Here, Solr is a specialized search engine based on Apache Lucene, which provides a basic framework related to search, and uses the Java language as an open source of an enterprise search engine. In addition, Elastic Search is an open source distributed search engine developed based on APACHE LUCENE like Solr, and has an advantage of excellent distributed processing and real-time processing capability. In addition, the method may further include a step of manually removing a list page or a garbage document that is not extracted from the body of the data stored in the external data collector.

또한, 상기 중계서버(22)는 국내외 뉴스와 국내 주요 재난 관련 학회의 논문 등의 외부데이터로부터 재난관련 키워드로 필터링하여 데이터를 수집하는 수집어댑터와 상기 필터링된 데이터를 저장하는 외부파일서버로 구성될 수 있으며, 이때 서버란 컴퓨터 네트워크에서 다른 컴퓨터에 서비스를 제공하기 위한 컴퓨터 또는 소프트웨어를 가리키는 용어를 총칭한다.The relay server 22 is composed of a collection adapter for collecting data by filtering out disaster related keywords from domestic and foreign news and external data such as papers of domestic major disaster related institutes and an external file server for storing the filtered data Refers to a computer or software for providing services to other computers in a computer network.

또한, 상기 외부데이터를 필터링하는 재난관련 키워드는 "호우, 태풍, 홍수, 강풍, 황사, 풍랑, 산사태, 폭염, 한파, 해일, 지진, 가뭄, 대설, 낙뢰, 우박, 화산폭발, 우주재해, 조류, 가축질병, 금융전산, 전염병, 전파재난, 정보통신, 화생방사고, 수도, 교통, 에너지, 보건의료, 교통사고, 폭발, 테러, 전쟁, 화재, 해양 환경오염사고, 수질 환경오염사고, 항공사고, 해상사고, 원전사고, 국가기반시설, 붕괴 등의 키워드 및 해당 키워드의 유의어 및 동의어" 등을 포함할 수 있고, 제1 DB서버(21)에 저장되어 활용되는 것이 바람직하다.In addition, disaster related keywords for filtering the above-mentioned external data may be classified into various types such as "rainfall, typhoon, flood, strong wind, yellow sand, windsurfing, landslide, heat wave, cold wave, tsunami, earthquake, drought, heavy snow, lightning, hail, Traffic accident, explosion, terrorism, war, fire, marine environment pollution accident, water quality environmental pollution accident, aviation accident, livestock disease, financial computing, infectious disease, radio wave disaster, , Marine accidents, nuclear accident, national infrastructure, collapse, etc., synonyms and synonyms of the keywords ", and may be stored and utilized in the first DB server 21.

또한 상기 데이터 분석부(40)는 상기 데이터 수집부(20)의 외부 데이터에 대하여 자연어 처리(NLP, natural language processing)를 수행하는 정제부(41);와, 상기 정제된 데이터들을 카테고리별로 문서 분류(document classification)를 수행하기 위한 분류부(42);와, 상기 분류를 위한 재난관련 카테고리에 대한 데이터를 저장하고 있는 DB저장부(43);와, 상기 카테고리별로 분류된 문서에 대하여 문서 군집(document clustering)을 수행하는 제1분석부(441);와, 상기 제1분석부(441)에서 군집된 문서의 정보를 추출하는 제2분석부(442)를 포함할 수 있다.The data analysis unit 40 includes a refinement unit 41 for performing natural language processing (NLP) on the external data of the data collection unit 20, a classification unit 42 for performing document classification, a DB storage unit 43 for storing data on disaster related categories for the classification, a first analyzing unit 441 for performing document clustering and a second analyzing unit 442 for extracting information of the documents clustered in the first analyzing unit 441.

이때 상기 데이터 분석부(40)에서 상기 수집된 데이터를 분석하는데 사용되는 개념이 바로 텍스트 마이닝이다. 텍스트 마이닝(text mining)이란 대규모의 문서(text)에서 의미 있는 정보를 추출하는 것을 말한다. 텍스트 마이닝은 텍스트 분석(text analytics), 텍스트 데이터베이스로부터 지식 발견(KDT, Knowledge Discovery in Textual Database), 문서 마이닝(document Mining) 등으로 불리기도 한다. 이러한 텍스트 마이닝은 분석 대상이 형태가 일정하지 않고 다루기 어려운 비정형 데이터이므로 인간의 언어를 컴퓨터가 인식해 처리하는 자연어 처리(NLP, natural language processing) 방법과 관련된다.At this time, the concept used to analyze the collected data in the data analysis unit 40 is text mining. Text mining refers to extracting meaningful information from a large amount of text. Text mining is also referred to as text analytics, Knowledge Discovery in Textual Database (KDT), and document mining. Such text mining is related to a natural language processing (NLP) method in which a computer recognizes and processes a human language since it is an unstructured data that is difficult to handle and difficult to handle.

좀 더 구체적으로 문서 분류(document classification), 문서 군집(document clustering), 메타데이터 추출(metedata extraction), 정보 추출(information extraction) 등으로 구분할 수 있다. 문서 분류는 도서관에서 주제별로 책을 분류하듯이 사전에 분류 정보를 알고 있는 상태에서 주제에 따라 분류하는 방법을 말하며, 문서 군집은 성격이 비슷한 문서끼리 같은 군집으로 묶어주는 방법인 것이 바람직하다. 또한, 정보 추출은 문서에서 중요한 의미가 있는 정보를 자동으로 추출하는 방법을 의미할 수 있다.More specifically, it can be classified into document classification, document clustering, metedata extraction, and information extraction. Document classification refers to a method of classifying books according to a subject in a state of knowing classification information in advance, such as classifying a book by subject in a library, and it is desirable that the document grouping is a method of grouping documents having similar characteristics into the same group. In addition, information extraction may mean a method of automatically extracting important information in a document.

상기 데이터 분석부(40)의 일 구성요소인 "정제부(41)"는, 자연어 처리 방법(NLP, natural language processing)을 통하여 데이터 전처리부(30)에 수집된 비정형 데이터들을 정제하는 과정을 거치는 곳을 말한다. 상기 정제란, 수집된 많은 데이터 중에서 핵심적인 키워드를 추출하는 것을 말한다. 따라서 정제 과정은 수집된 수많은 데이터 속에서 실제 분석에 필요한 것만 추출해내는 과정으로, 빅데이터 분석에 아주 중요한 부분이다. 또한, 상기 자연어란 사람들이 일상적으로 쓰는 언어를 뜻하며, 이러한 자연어를 컴퓨터가 인식할 수 있도록 인공어로 만들어주는 기술이 자연어 처리(NLP, natural language processing) 기술이다.The "refining unit 41 " which is a component of the data analyzing unit 40 performs a process of refining the irregular data collected in the data preprocessing unit 30 through the natural language processing (NLP) The place. The refinement refers to extracting key keywords from a large amount of collected data. Therefore, the refinement process is the process of extracting only what is necessary for the actual analysis in the collected data, which is a very important part of the big data analysis. In addition, the natural language refers to a language commonly used by people, and a technology for making an artificial language for recognizing such natural language is natural language processing (NLP) technology.

상기 자연어 처리(NLP, natural language processing)란, 자연어 등 기존 데이터를 형태소로 바꾸는 형태소 분석(Morphological Analysis), 구문 분석(Syntactic Parsing), 의미 분석(Semantic Analysis), 담화 분석(Discourse Analysis) 등을 통하여 자연어 이해 과정을 거친 후 상기 분석한 결과물을 사람의 편의성에 입각하여 텍스트, 음성, 그래픽 등을 생성하는 자연어 생성 단계를 거치는 것을 말한다. The above natural language processing (NLP) is a method of analyzing morphological analysis, syntactic parsing, semantic analysis, and discourse analysis, which converts existing data such as natural language into morphemes. And a natural language generating step of generating text, voice, graphics, etc. based on the convenience of the human being after analyzing the result of the natural language.

또한, 상기 데이터 분석부(40)의 일 구성요소인 "분류부(42)"는, 상기 정제된 데이터들을 대상으로 재난관련 카테고리에 대한 데이터를 기준으로 문서 유사도를 측정하고 자동으로 카테고리별로 문서 분류(document classification)를 수행하는 것을 특징으로 할 수 있다. 먼저, 상기 정제된 데이터들의 출처를 기준으로 뉴스와 논문으로 분류하고, 재난관련 카테고리에 대한 데이터를 기준으로 동의어, 유의어 등을 함께 고려하여 문서 분류(document classification)를 수행한다. The classifying unit 42, which is a component of the data analyzing unit 40, measures the document similarity based on the data of the disaster related category with respect to the refined data, automatically classifies the document into categories (document classification) is performed. First, the classified data is sorted into news and articles based on the source of the data, and document classification is performed by considering synonyms, synonyms, and the like based on data on the disaster related category.

또한, 상기 분류부(42)의 분류 기준이되는 "재난관련 카테고리에 대한 데이터"는, DB저장부(43)에 저장되며, "재난유형, 사회환경, 피해속성 등"으로 크게 나눌 수 있다. The "disaster-related category data" that is a classification criterion of the classification unit 42 is stored in the DB storage unit 43 and can be roughly divided into "disaster type, social environment, damage property, and the like".

상기 재난관련 카테고리에 대한 데이터 중 "재난유형"은 "호우, 태풍, 홍수, 강풍, 황사, 풍랑, 산사태, 폭염, 한파, 해일, 지진, 가뭄, 대설, 낙뢰, 우박, 화산폭발, 우주재해, 조류, 가축질병, 금융전산, 전염병, 전파재난, 정보통신, 화생방사고, 수도, 교통, 에너지, 보건의료, 교통사고, 폭발, 테러, 전쟁, 화재, 해양 환경오염사고, 수질 환경오염사고, 항공사고, 해상사고, 원전사고, 국가기반시설, 붕괴 등의 카테고리 및 해당 카테고리의 유의어 및 동의어"로 세분화할 수 있다. Among the data on the above disaster related category, "disaster type" means "disaster type" such as "rain, typhoon, flood, strong wind, yellow sand, storm, landslide, heat wave, cold wave, tsunami, earthquake, drought, heavy storm, Traffic accident, explosion, terrorism, war, fire, marine environment pollution accident, water pollution accident, aviation, traffic accident, water, traffic, energy, Accidents, maritime accidents, nuclear accidents, national infrastructures, collapses, and synonyms and synonyms of such categories ".

또한, 상기 재난관련 카테고리에 대한 데이터 중 "사회환경"은 "농업, 어업, 임업, 축산업, 에너지, 교통, 보건.위생, 수자원, 치안 등의 카테고리 및 해당 카테고리의 유의어 및 동의어"로 세분화할 수 있다.Also, among the data on the disaster related category, "social environment" can be divided into "category of agriculture, fishery, forestry, animal husbandry, energy, transportation, health and sanitation, have.

또한, 상기 재난관련 카테고리에 대한 데이터 중 "피해속성"은 "가축피해, 인명피해, 재산피해, 시설피해 등의 카테고리 및 해당 카테고리의 유의어 및 동의어"로 세분화할 수 있다.In addition, among the data on the above disaster related category, "damage property" can be subdivided into "cattle damage, human injury, property damage, facility damage,

또한 상기 데이터 분석부(40)의 일 구성요소인 "제1분석부(441)"는 상기 분류부(42)에서 카테고리별로 문서 분류된 다양한 데이터들을 종합하여 문서 군집(document clustering)을 수행하는 과정을 거쳐 그룹화되는 곳을 의미할 수 있다. 이를테면, 재난관련 뉴스와 함께 해당 재난과 관련된 논문이 그룹화되는 경우를 예로 들 수 있다.The first analyzing unit 441, which is a component of the data analyzing unit 40, performs a process of document clustering by synthesizing various data classified by category in the classifying unit 42 And the like. For example, disaster-related news and disaster-related papers are grouped together.

또한 상기 데이터 분석부(40)의 일 구성요소인 "제2분석부(442)"는, 상기 제1분석부(441)에서 군집되어 그룹화된 문서의 정보 추출(information extraction)을 수행하는데, 이를테면, 재난관련 뉴스에서 지역, 피해대상, 재난유형, 피해금액 등의 핵심키워드를 추출하는 것을 예로 들 수 있다. The second analyzing unit 442, which is a component of the data analyzing unit 40, performs information extraction of the grouped documents clustered by the first analyzing unit 441, for example, , And extracting key keywords such as the area, the victim, the type of disaster, and the amount of damage in disaster-related news.

또한, 도 2는 본 발명의 제2실시예에 따른 미래위험 변화예측의 분석을 위한 시스템의 구성을 도시한 도면이다. 2 is a diagram illustrating a configuration of a system for analyzing a future risk change prediction according to a second embodiment of the present invention.

본 발명의 제2실시예에 따른 미래위험 변화예측의 분석을 위한 시스템은, 상기 도 1의 본 발명의 제1실시예에 따른 미래위험 변화예측의 분석을 위한 시스템에, 제2 DB서버(23)과 데이터 전처리부(30)를 더 포함할 수 있다.The system for analyzing the future risk change prediction according to the second embodiment of the present invention includes a system for analyzing the future risk change prediction according to the first embodiment of the present invention shown in FIG. And a data preprocessing unit 30.

구체적으로, 상기 제1실시예와 같은 구성을 가진 본 발명에 있어서, 상기 데이터 수집부(20)는 '내부데이터 DB' 및 '정부기관에서 공중에 제공하는 DB'를 저장하는 제2 DB서버(23)를 더 포함할 수 있다.Specifically, in the present invention having the same configuration as that of the first embodiment, the data collecting unit 20 includes a second DB server (hereinafter referred to as " DB DB " 23).

또한 상기 내부데이터 DB 및 정부기관에서 공중에 제공하는 DB에 대하여 ETL(Extraction, Transformation, Loading)을 수행한 후 상기 중계서버(22)에서 필터링된 외부데이터와 통합하는 데이터 전처리부(30)를 더 포함할 수 있다.Further, the data preprocessing unit 30 for performing ETL (Extraction, Transformation, Loading) on the DB provided to the public by the internal data DB and the government agency, and integrating the extracted data with the filtered external data from the relay server 22 .

이때, 상기 '내부데이터 DB'는 본 발명의 운영주체가 내부적으로 구축한 재난관련 DB를 의미할 수 있다. 특히, 상기 운영주체는 "국립재난안전연구원"인 것이 바람직하다. 이때, "국립재난안전연구원"이 내부적으로 구축한 재난관련 DB는 "국립재난안전연구원"이 주도하였거나 참가하여 수행한 '재난관련 연구, 실험, 또는 논문 등의 DB'와 '상기 "국립재난안전연구원"에서 발간한 보고서'를 포함할 뿐만 아니라, '상기 "국립재난안전연구원"에 구비된 내부 서버에 저장된 재난관련 DB 일체'를 의미하는 것이 바람직하다.At this time, the 'internal data DB' may mean a disaster related DB established internally by the operating entity of the present invention. In particular, it is preferable that the above-mentioned operating entity is "National Disaster Safety Research Institute ". In this case, the disaster related DB built up internally by the "National Disaster Safety Institute" is "DB of disaster related researches, experiments, or papers" led by "National Disaster Safety Institute" Report "published by the" Researcher ", but also means" all disaster related DBs stored in an internal server provided in the National Institute for Disaster Reduction ".

또한, 상기 '정부기관에서 공중에 제공하는 DB'는 기상청, 통계청 등 "정부 3.0"에서 제공하는 다양한 기관들의 DB를 기반으로 할 수 있다.In addition, the 'DB provided by the government agency to the public' may be based on DB of various organizations provided by 'Government 3.0' such as the Korea Meteorological Administration and the National Statistical Office.

상기 "정부 3.0"이란 공공 정보를 개방·공유하고, 부처 간 칸막이를 없애고 소통·협력함으로써 국정 과제에 대한 추진 동력을 확보하고, 국민 맞춤형 서비스를 제공하며, 동시에 일자리 창출과 창조경제를 지원하는 새로운 정부 운영 패러다임을 말한다. 또한, 공공 데이터를 민간이 다양하게 활용할 수 있도록 한 것이 특징이며, 민간이 보다 편리하게 공공 데이터를 활용할 수 있도록 데이터베이스를 표준화하고 오픈 플랫폼을 마련하고 법제도를 개선하는 것을 지향하고 있다. The abovementioned "Government 3.0" is a new system to open and share public information, to eliminate the divisions between departments, to communicate and cooperate with each other, to secure the driving force for national affairs, to provide customized services for the people, Government operating paradigm. In addition, it is characterized by allowing various kinds of public data to be utilized by the private sector. It aims to standardize the database so that the public can use the public data more conveniently, to provide an open platform, and to improve the legal system.

상기 "정부 3.0"에서 제공하는 다양한 기관들의 DB란 국립환경과학원, 국토교통부, 행정자치부, 한국원자력안전기술원, 환경부 등에서 정보공개한 내용과 국토교통부, 행정자치부, 한국정보화진흥원 등의 데이터 개방한 내용 및 민원24, 국민신문고 등에서 제공하는 DB를 의미한다(http://www.gov30.go.kr/gov30/int/ intro6.do 참조). 이는 정부 기관의 정책에 따라 변경될 수 있으며, 상기 기재된 기관에 한정되지 않는다.The database of the various institutions provided in the above-mentioned "Government 3.0" includes information disclosed by the National Institute of Environmental Research, the Ministry of Land and Transportation, the Ministry of Government Administration and Home Affairs, the Korea Nuclear Safety Technology Institute and the Ministry of Environment, and the data opened by the Ministry of Land Transportation, Ministry of Government Administration and Home Affairs, And civil affairs 24, National Ombudsman, etc. (see http://www.gov30.go.kr/gov30/int/ intro6.do). This can be changed according to the policy of the government agency, and is not limited to the above-described organization.

또한, 상기 제2실시예에 따른 미래위험 변화예측 분석 시스템 및 방법에 있어서, 상기 제2 DB서버(23)의 내부데이터 DB 및 정부기관에서 공중에 제공하는 DB에 대하여 ETL(Extraction, Transformation, Loading)을 수행한 후 상기 중계서버(22)에서 필터링된 외부데이터와 통합하는 데이터 전처리부(30)를 더 포함할 수 있다.In addition, in the future risk change prediction analysis system and method according to the second embodiment, an internal data DB of the second DB server 23 and an ETL (Extraction, Transformation, Loading) And a data preprocessing unit 30 for integrating the extracted data with the external data filtered by the relay server 22.

상기 ETL(Extraction, Transformation, Loading)이란, 데이터 웨어하우스(DW, Data Warehouse) 구축 시 데이터를 운영 시스템에서 추출하여 가공(변환, 정제)한 후 데이터 웨어하우스에 적재하는 모든 과정을 말하는 것으로, 여기서 데이터 웨어하우스란 상기 내/외부 데이터가 통합된 데이터 전처리부(30)를 의미한다고 보는 것이 바람직하다. 이러한 ETL과정은 데이터 양이 많아 DB구축에 영향이 있을 수 있는 경우에 사용하는 것이 바람직하다. The above ETL refers to all the processes of extracting data from an operating system when data warehouse (DW) is constructed, processing (converting, refining) data, and loading the data warehouse The data warehouse is preferably a data preprocessing unit 30 in which the internal / external data is integrated. It is preferable to use this ETL process when there is a large amount of data and it may affect DB construction.

또한, 상기 제2실시예에 따른 데이터 분석부(40)는 상기 데이터 전처리부(30)의 내/외부 통합데이터에 대하여 자연어 처리(NLP, natural language processing)를 수행하는 정제부(41);와, 상기 정제된 데이터들을 카테고리별로 문서 분류(document classification)를 수행하기 위한 분류부(42);와, 상기 분류를 위한 재난관련 카테고리에 대한 데이터를 저장하고 있는 DB저장부(43);와, 상기 카테고리별로 분류된 문서에 대하여 문서 군집(document clustering)을 수행하는 제1분석부(441);와, 상기 제1분석부(441)에서 군집된 문서의 정보를 추출하는 제2분석부(442)를 포함할 수 있다. 이하, 제1실시예에서 설명한 바와 같으므로 그 설명은 생략한다.The data analysis unit 40 according to the second embodiment includes a refining unit 41 for performing natural language processing (NLP) on the internal and external integrated data of the data preprocessing unit 30, A classification unit 42 for classifying the refined data according to a category, a DB storage unit 43 for storing data on disaster related categories for the classification, A first analyzing unit 441 for performing document clustering on a document classified by category, a second analyzing unit 442 for extracting information on the documents clustered in the first analyzing unit 441, . &Lt; / RTI > Hereinafter, it is the same as that described in the first embodiment, and a description thereof will be omitted.

한편, 도 3은 본 발명의 제3실시예에 따른 미래위험 변화예측 분석 시스템의 데이터 분석부(40)의 구성을 도시한 도면이다.3 is a diagram illustrating a configuration of a data analysis unit 40 of a future risk change prediction analysis system according to a third embodiment of the present invention.

본 발명의 제3실시예에 따른 미래위험 변화예측 분석 시스템에 있어서, 상기 제1실시예 및 제2실시예에서 설명한 데이터 분석부(40)의 정제부(41), 분류부(42), DB저장부(43), 제1분석부(441) 및 제2분석부(442) 이외에, 상기 제2분석부(442)에서 추출된 정보를 분석하여 전년대비 증가추이를 기준으로 재난 유형별로 수치화하는 제3분석부(443)를 더 포함할 수 있고, 또한, 상기 제3분석부(443)에서 수치화된 분석결과를 가지고 특정 이슈의 확산 경로를 추척하는 제4분석부(444);를 더 포함할 수 있다.In the future risk change prediction analyzing system according to the third embodiment of the present invention, the refining unit 41, the classifying unit 42, the DB (not shown) of the data analyzing unit 40 described in the first and second embodiments, In addition to the storage section 43, the first analyzing section 441 and the second analyzing section 442, the information extracted from the second analyzing section 442 is analyzed and numericized according to the disaster type based on the year-on-year increase trend And a fourth analyzing unit 444 that can further include a third analyzing unit 443 and also tracks the diffusion path of a specific issue with the analysis result quantified by the third analyzing unit 443 can do.

또한 상기 구성요소인 "제3분석부(443)"는, 상기 제2분석부(442)에서 추출된 핵심키워드를 정량데이터로 변환하고, 이전 데이터와 비교하여 상대적인 증가, 감소 정도를 수치화하는 것을 포함할 수 있다. 상기 정량데이터란, 날짜별 또는 재난유형별로 뉴스가 게재된 숫자를 의미하고, 이를 그래프화하여 시각적으로 표현한 것을 "재난 트렌드"라고 볼 수 있다.The third analyzing unit 443, which is the above-mentioned component, converts the key keywords extracted by the second analyzing unit 442 into quantitative data, and compares the extracted key data with previous data to quantify relative increase and decrease levels . The quantitative data means numbers in which news is posted by date or by disaster type, and a graphical visual representation of the news is referred to as "disaster trend ".

또한 상기 구성요소인 "제4분석부(444)"는, 상기 제3분석부(443)에서 수치화된 분석결과를 가지고 Shock Model을 활용한 확산경로 예측 모델, 비선형회기분석(NLIN)모델, 지수가중이동평균 모델 또는 Sørensen-Dice coefficient 알고리즘 중 어느 하나 이상을 활용하여 특정 이슈의 확산 경로를 추척하고 이슈 강도를 수치화하여 미래위험 변화를 예측하고 분석하는 것을 포함할 수 있다. 특히, 본 발명에서 확산경로 예측은, N-Gram 알고리즘을 기반으로 하는 Sørensen-Dice coefficient 알고리즘을 활용하는 것이 바람직하다. The fourth analyzing unit 444 may be a diffusion path prediction model using a shock model, a nonlinear regression analysis (NLIN) model, an index A weighted moving average model or a Sørensen-Dice coefficient algorithm to track the diffusion path of a particular issue and quantify the issue strength to predict and analyze future risk changes. In particular, in the present invention, it is desirable to use the Sørensen-Dice coefficient algorithm based on the N-Gram algorithm.

상기 N-Gram 알고리즘이란, 텍스트를 N개의 기준 단위로 문자를 절단하는 방법이며, 각각의 뉴스를 N-Gram 알고리즘으로 절단한 문자열끼리 비교하여, 동일한 문자열의 빈도수를 비교하여 기준 값 이상의 빈도수를 갖는 뉴스끼리 서로 그룹화 할 수 있다. The N-Gram algorithm is a method of truncating a text in N reference units. The N-Gram algorithm compares the strings of the news cut with the N-Gram algorithm to compare the frequencies of the same strings, News can be grouped together.

상기 기준 값 이상의 빈도수를 갖는 뉴스들을 판단하는 기준은 문서유사도(QS)를 이용할 수 있다. 상기 문서유사도(QS)는 각각의 문자열의 길이 A, B와 A, B 사이에 N-Gram 알고리즘으로 절단한 문자열들 중에 같은 값을 갖는 경우의 수 2C를 가지고 그 값을 구할 수 있다. A criterion for judging news having a frequency higher than the reference value may be a document similarity (QS). The document similarity (QS) can be obtained from the lengths A and B of each character string and the number 2C in the case of having the same value among the strings truncated by the N-Gram algorithm between A and B.

상기 내용을 수식화하면, 아래의 식과 같다.When the above content is formulated, it is as follows.

또한, 상기 제4분석부(444)의 이슈강도는, 최초 발생된 뉴스와 유사한 문서집합을 말한다. 따라서 상기 유사한 문서집합 전체 개수 또는 각 문서집합 내의 문서의 개수 등의 수치를 활용할 수 있다.In addition, the issue strength of the fourth analyzer 444 refers to a set of documents similar to the news that has been generated for the first time. Therefore, numerical values such as the total number of similar document sets or the number of documents in each document set can be utilized.

한편, 도 4는 본 발명의 제4실시예에 따른 표시부(60)의 구성을 도시한 도면이다.4 is a diagram showing the configuration of the display unit 60 according to the fourth embodiment of the present invention.

본 발명의 제4실시예에 따른 미래위험 분석 시스템은 상기 제1실시예 내지 제3실시예의 표시부(60)를 포함하며, 상기 표시부(60)는 상기 분석DB에 저장된 분석결과에 대하여 ETL(Extraction, Transformation, Loading)을 수행한 후 시각화하여 나타내되, 상기 분류부(42)에서 재난관련 카테고리별 누적 뉴스량 또는 재난관련 총 누적 뉴스량 등의 숫자 통계량을 시각화한 스캐닝부(61);와, 상기 제1분석부(441)에서 문서 군집(document clustering)이 수행된 문서 그룹, 상기 제2분석부(442)에서 추출된 핵심키워드들을 시각화한 모니터링부(62);와, 상기 제3분석부(443)에서 변환된 정량데이터를 주제별, 시기별로 비교할 수 있도록 한 화면에 시각화한 비교분석부(63);와, 상기 제4분석부(444)에서 수치화된 이슈 강도를 시기별로 시각화한 이슈추적부(64); 및 상기 데이터 수집부(20) 또는 상기 데이터 전처리부(30)의 수집된 데이터나 통합데이터의 논문을 직접 검색할 수 있도록 입력부를 구비하고 검색결과를 시각화하는 논문검색부(65);를 포함할 수 있다.The future risk analysis system according to the fourth embodiment of the present invention includes the display unit 60 of the first to third embodiments and the display unit 60 displays the analysis results stored in the analysis DB, The scanning unit 61 visualizes a numerical statistic such as a cumulative news amount for each disaster related category or a total cumulative news amount for a disaster in the classifying unit 42, A monitoring unit 62 that visualizes document groups in which document clustering is performed in the first analyzing unit 441 and key keywords extracted in the second analyzing unit 442, A comparison analysis unit 63 that visualizes the quantitative data converted by the fourth analyzing unit 443 on a screen so as to be able to compare them by topic and period, (64); And a thesis searching unit (65) having an input unit for directly searching the collected data of the data collecting unit (20) or the data preprocessing unit (30) or theses of the integrated data and visualizing the search result .

상기 표시부(60)와 같이 데이터를 분석한 결과를 사용자가 쉽게 이해할 수 있도록 도표라는 시각적 수단을 통해 정보를 효과적으로 전달하는 것을 데이터 시각화라고 하며, 수많은 데이터를 한 장의 그림으로 요약한 인포그래픽과 문서에 사용된 단어의 빈도와 중요도를 시각적으로 표현한 단어 구름이 대표적이다.The data visualization effectively transfers information through a graphical visual means so that the user can easily understand the result of analyzing the data as in the display unit 60. The data visualization refers to a process in which a large number of data are summarized in a single picture, Typically, word clouds are used to visually express the frequency and importance of words used.

이러한 데이터 시각화 중 하나인 정보 그래픽은 인포그래픽(infographic)이라고도 불리는데 정보와 데이터, 지식을 시각적으로 표현하는 것을 말한다. 표지판이나 지도, 언론, 기술보고서, 교육 분야에서 발생하는 복잡한 정보를 빠르고 명확하게 표현하는 것이 핵심이다. 이러한 시각화를 지원하는 도구로는 마이크로소프트의 엑셀(Excel)이나 구글의 스프레드시트(Spreadsheets) 등의 프로그램을 이용할 수 있다. 또한 전문적인 분석을 위한 프로그래밍 언어로는 파이선(python), 피에이치피(PHP) 등이 있고 오픈 소스인 프로세싱(Processing)과 R 등이 있다. One of these data visualizations, information graphics, also called infographic, refers to visual representation of information, data, and knowledge. The key is to express complex information quickly and clearly in signs, maps, press, technical reports, and education. Tools that support this visualization include Microsoft's Excel and Google's Spreadsheets. There are also programming languages for professional analysis such as python and php, and open source Processing and R.

또한, 도 5는 본 발명의 제5실시예에 따른 미래위험 변화예측의 분석을 위한 시스템의 구성을 도시한 도면이다. 5 is a diagram illustrating a configuration of a system for analyzing a future risk change prediction according to a fifth embodiment of the present invention.

구체적으로, 도 5의 제5실시예는 상기 제1실시예 내지 제4실시예 및 도 1 내지 도 4 전체에 대한 구성을 하나로 통합하여 도시한 도면으로, 그 구체적인 내용은 상기 설명한 바와 같다.Specifically, the fifth embodiment shown in FIG. 5 is a view of the first to fourth embodiments and the entire configuration of FIG. 1 to FIG. 4 as one unit, and the detailed contents thereof are as described above.

도 6는 본 발명의 제6실시예에 따른 스캐닝부(61)를 예시한 도면이다.6 is a diagram illustrating a scanning unit 61 according to a sixth embodiment of the present invention.

구체적으로, 상기 도 6에서 예시된 스캐닝부(61)는 상기 분류부(42)에서 재난관련 카테고리별 누적 뉴스량 또는 재난관련 총 누적 뉴스량 등의 숫자 통계량이 표시된 통계부(100);와, 전국 지역과 표시하고자 하는 월(月)을 지정할 수 있는 입력부(200);와, 상기 입력부(200)에 입력된 정보에 맞춰서 '재난관련 카테고리별로 세분화된 주제별 누적 뉴스량'의 '자연재난, 사회재난, 사회환경 및 피해속성의 카테고리별 누적 뉴스량' 대비 비율을 도식화해서 보여주고 이에 대한 수치를 다운로드할 수 있게 표시하는 도입부(300);와, 상기 입력부(200)에 입력된 정보에 맞춰서 상기 제3분석부(443)에서 변환된 정량데이터를 주제별, 시기별로 도시한 재난 트렌드 및 상기 변환된 정량데이터의 핵심키워드를 함께 병기하는 것을 특징으로 하는 트렌드부(400);와, 상기 입력부(200)에 입력된 정보에 맞춰서 상기 제2분석부(442)에서 추출된 핵심키워드들을 사용 빈도 순위에 따라 색깔을 달리하여 시각화한 키워드부(500);를 포함할 수 있다.Specifically, the scanning unit 61 illustrated in FIG. 6 includes a statistic unit 100 for displaying a numerical statistic such as a cumulative news amount for each disaster related category or a total cumulative news amount for a disaster in the classifying unit 42, (200) for designating a nationwide area and a month to be displayed; and an input unit (200) for inputting information on 'a natural disaster, a society An input unit 300 for schematically displaying the cumulative news volume for each category of a disaster, a social environment, and a damage attribute, and displaying a numerical value for downloading, A trend unit 400 for describing the disaster trends and the key keywords of the converted quantitative data together with the quantitative data converted by the third analyzer 443 by theme and time, And a keyword unit 500 for visualizing the key keywords extracted by the second analyzing unit 442 according to the information inputted to the search unit 200 according to the frequency of use.

특히, 상기 도입부(300)의 구체적인 예시를 살펴보면 다음과 같다.Particularly, a concrete example of the introduction part 300 will be described as follows.

상기 도입부(300)는, 상기 입력부(200)에 입력된 정보에 맞춰서 자연재난 중 홍수, 태풍, 강풍, 호우, 가뭄 등으로 카테고리화된 분석DB의 각 데이터 비율, 사회환경 중 교통, 보건·위생, 에너지, 수자원, 농업 등으로 카테고리화된 분석DB의 각 데이터 비율, 사회재난 중 교통사고, 보건의료, 정보통신, 해양선박사고, 금융전산 등으로 카테고리화된 분석DB의 각 데이터 비율, 피해속성 중 인명피해, 재산피해, 시설피해, 가축피해 등으로 카테고리화된 분석DB의 각 데이터 비율을 도식화해서 보여주고 이에 대한 수치를 다운로드할 수 있게 표시하는 것을 의미할 수 있다.The introduction unit 300 is configured to classify the data rates of the analysis DB categorized into natural disaster floods, typhoons, strong winds, heavy rain, and droughts in accordance with information input to the input unit 200, traffic, , Each data ratio of analytical DB categorized by traffic data category of analysis DB categorized as energy, water resources, agriculture, traffic accident during social disaster, health care, information communication, marine vessel accident, It can be shown that each data ratio of the analysis DB categorized as life damage, property damage, facility damage, cattle damage, etc. is displayed and the numerical value thereof can be downloaded.

또한, 도 7은 본 발명의 제7실시예에 따른 모니터링부(62)를 예시한 도면이다.7 is a diagram illustrating a monitoring unit 62 according to a seventh embodiment of the present invention.

구체적으로, 상기 모니터링부(62)는 전국 지역과 표시하고자 하는 자연재난, 사회재난, 사회환경, 피해속성의 종류 및 표시할 기준일, 표시하고자 하는 월(月) 및 검색하고자 하는 검색어를 입력할 수 있는 입력부(110);와, 상기 입력부에 입력된 정보에 맞춰서 상기 제3분석부(443)에서 변환된 정량데이터를 주제별, 시기별로 도시한 재난 트렌드 및 상기 변환된 정량데이터의 핵심키워드를 함께 병기하는 것을 특징으로 하는 트렌드부(210);와, 상기 입력부에 입력된 정보에 맞춰서 상기 제1분석부(441)에서 문서 군집(document clustering)이 수행된 문서 그룹이 그 토픽과 공통 키워드로 표시되며 상기 문서 그룹의 다운로드가 가능하도록, 그리고 각 문서 전문을 볼 수 있도록 구성한 것을 특징으로 하는 재난 토픽 및 뉴스부(310);와, 상기 입력부에 입력된 정보에 맞춰서 상기 제2분석부(442)에서 추출된 핵심키워드들을 관련도에 따라 중심 키워드와 관련 키워드로 나누고, 관련도가 높을수록 중심 키워드와 가까이 배치되도록 표시하여 연관어 현황을 한눈에 볼 수 있도록 나타내는 제1 연관어 현황부(410);를 포함할 수 있다.Specifically, the monitoring unit 62 can input a search term to be searched and a month to be displayed and a search term to be searched based on the type of the natural disaster, the social disaster, the social environment, And a second analyzing unit 443 for analyzing the quantitative data converted by the third analyzing unit 443 according to the topic and the timing and the key keywords of the converted quantitative data together with the input information, A document group in which document clustering has been performed in the first analyzing unit 441 according to information input to the input unit is displayed as a topic and a common keyword, A disaster topic and news section (310) configured to enable downloading of the document group and to view each document text; The main keywords extracted by the second analyzing unit 442 are divided into a central keyword and a related keyword according to the degree of relevance and the higher the degree of relevance is, And a first association word status unit 410.

또한, 도 8은 본 발명의 제8실시예에 따른 비교분석부(63)를 예시한 도면이다.8 is a diagram illustrating a comparative analysis unit 63 according to an eighth embodiment of the present invention.

구체적으로, 상기 비교분석부(63)는 전국 지역과 표시하고자 하는 자연재난, 사회재난, 사회환경, 피해속성의 종류 및 표시할 기준일, 표시하고자 하는 월(月) 및 검색하고자 하는 검색어를 입력할 수 있는 입력부(120);를 다수개 포함할 수 있고, 상기 입력부에 입력된 정보에 맞춰서 상기 제3분석부(443)에서 변환된 정량데이터를 주제별, 시기별로 도시한 재난 트렌드 및 상기 변환된 정량데이터의 핵심키워드를 함께 병기하는 것을 특징으로 하는 다수개의 트렌드부를 한 차트에 표시하여 한눈에 비교할 수 있도록 하는 비교부(220, 320)를 다수개 포함할 수 있다.Specifically, the comparison and analysis unit 63 inputs a natural disaster such as a natural disaster, a social disaster, a social environment, a type of damage attribute, a reference date to display, a month to be displayed, A plurality of discrete quantities of the quantitative data converted by the third analyzer 443 in accordance with the information input to the input unit, A plurality of trending units, which are characterized by combining key keywords of the data together, may be included in a plurality of comparison units 220 and 320 which can be compared at a glance.

또한, 도 9는 본 발명의 제9실시예에 따른 이슈추적부(64)를 예시한 도면이다.9 is a diagram illustrating an issue tracking unit 64 according to a ninth embodiment of the present invention.

구체적으로, 상기 이슈추적부(64)는 상기 제4분석부(444)에서 분석한 결과를 토대로, 이슈 확산 형태를 추적할 수 있도록 하는데 특징이 있다. 상기 이슈추적부는 전국 지역과 표시하고자 하는 자연재난, 사회재난, 사회환경, 피해속성의 종류 및 표시할 기준일, 표시하고자 하는 월(月) 및 검색하고자 하는 검색어를 입력할 수 있는 입력부(130)와, 상기 입력부(130)에 입력된 정보에 맞춰서 이슈발생일을 가로축으로, 이슈 강도를 세로축으로 도표화하여 특정 이슈가 확산된 범위를 시각화한 이슈 확산 형태 추적부(230)와, 상기 입력부(130)에 입력된 정보에 맞춰서 특정 이슈에 관하여 상기 제1분석부(441)에서 문서 군집(document clustering)이 수행된 문서 그룹을 시각화한 이슈 뉴스부(330)와, 상기 입력부(130)에 입력된 정보에 맞춰서 상기 제2분석부(442)에서 추출된 핵심키워드들을 관련도에 따라 중심 키워드와 관련 키워드로 나누고, 관련도가 높을수록 중심 키워드와 가까이 배치되도록 표시하여 연관어 현황을 한눈에 볼 수 있도록 나타내는 제1 연관어 현황부(430)를 더 포함할 수 있다. 이때 상기 이슈 확산 형태 추적부(230)의 이슈 강도는, 상기 제4분석부(444)에서 분석한 결과를 토대로 특정 이슈와 유사한 문서집합 및 그 개수를 의미하는 것이 바람직하다.Specifically, the issue tracking unit 64 is capable of tracking an issue spreading type based on the analysis result of the fourth analyzing unit 444. The issue tracking unit includes an input unit 130 for inputting a national area and a natural disaster to be displayed, a social disaster, a social environment, a type of damage attribute, a date to display, a month to display, An issue diffusion form tracing unit 230 for visualizing the issue occurrence date on the horizontal axis and the issue intensity on the vertical axis in accordance with the information input to the input unit 130 to visualize a range in which a specific issue is diffused, An issue news section 330 for visualizing a document group in which document clustering has been performed in the first analyzing section 441 with respect to a specific issue in accordance with information input to the input section 130, The central keyword extracted by the second analyzing unit 442 is divided into a central keyword and a related keyword according to the degree of relevance, And a first associative language status field 430 indicating that the current status can be viewed at a glance. The issue intensity of the issue diffusion form tracing unit 230 may be a set of documents similar to a specific issue based on the analysis result of the fourth analysis unit 444 and the number of the same.

도 10은 본 발명의 제10실시예에 따른 논문검색부(65)를 예시한 도면이다.10 is a diagram illustrating a thesis search unit 65 according to a tenth embodiment of the present invention.

구체적으로, 상기 논문검색부(65)는 상기 데이터 전처리부(30)의 수집된 데이터나 통합데이터의 재난 관련 논문을 검색할 수 있도록, 논문이 출간된 기간 및 논문의 제목, 저자, 초록에 대한 검색어를 입력할 수 있는 입력부(140)를 구비하고 논문의 제목, 저자, 출처, 논문의 초록을 표시하는 검색결과 표시부(240)를 구비할 수 있다.Specifically, the thesis searching unit 65 searches for the disaster related papers of the collected data of the data preprocessing unit 30 and the integrated data, And a search result display unit 240 having an input unit 140 for inputting search terms and displaying an abstract of a title, an author, a source, and a thesis of a thesis.

도 11은 본 발명의 제11실시예에 따라 표시부(60)가 영어로 표시된 미래위험 분석 시스템을 예시한 도면이다. 11 is a diagram illustrating a future risk analysis system in which the display unit 60 is displayed in English according to an eleventh embodiment of the present invention.

본 발명의 제11실시예에 따른 미래위험 분석 시스템은 상기 제1실시예 내지 제10실시예를 포함하되, 그 표시 언어로 실시예와 같이 한국어 뿐 아니라, 영어, 일어, 중국어 등의 외국어가 사용될 수도 있으며, 이때 상기 나열된 외국어 이외의 다른 외국어도 사용될 수도 있다. The system for analyzing future risk according to the eleventh embodiment of the present invention includes the first to tenth embodiments, in which not only Korean but also foreign languages such as English, Japanese, and Chinese are used as display languages And the foreign language other than the listed foreign language may be used at this time.

도 12는 본 발명의 제12실시예에 따라 군집화된 뉴스기사들의 목록을 표시하는 새로운 창을 예시한 도면이다.12 is a diagram illustrating a new window displaying a list of news articles clustered according to a twelfth embodiment of the present invention.

본 발명의 제12실시예에 따른 미래위험 분석 시스템은 상기 제4실시예 내지 제11실시예에서 모니터링부(62)의 재난 토픽 및 뉴스부 또는 이슈추적부(64)의 이슈 뉴스부 상에 표시된 뉴스 리스트 중 어느 하나의 뉴스에 입력장치를 통해 이를 선택하는 명령을 입력한 경우, 상기 입력된 뉴스와 관련하여 상기 제1실시예의 분류부(42)에서 문서 분류(document classification)된 후, 제1분석부(441)에서 문서 군집(document clustering)이 수행된 관련성 있는 뉴스 기사들의 목록을 새로운 창에 표시할 수 있다.The future risk analysis system according to the twelfth embodiment of the present invention is characterized in that in the fourth to eleventh embodiments, the disaster topics of the monitoring unit 62 and the issue news items of the news report or issue tracking unit 64 When a command to select one of the news items through the input device is inputted to the news item, the classification unit 42 of the first embodiment classifies the inputted news item according to the document classification, The analysis unit 441 may display a list of relevant news articles on which document clustering has been performed in a new window.

도 13은 본 발명의 제13실시예에 따라 제2 연관어 현황부를 나타낸 도면이다.FIG. 13 is a diagram illustrating a second association status unit according to a thirteenth embodiment of the present invention.

본 발명의 제13실시예에 따른 미래위험 분석 시스템은 상기 제4실시예 내지 제11실시예에서 모니터링부(62) 또는 이슈추적부(64)의 제1 연관어 현황부(410, 430)에 표시된 어느 하나의 키워드에 대하여 입력장치를 통해 클릭, 터치 또는 음성명령어 입력 등으로 명령을 입력하면, 상기 키워드만을 중심으로 관련도가 높은 연관어 현황을 한눈에 볼 수 있는 제2 연관어 현황부를 더 포함할 수 있다. The future risk analysis system according to the thirteenth embodiment of the present invention may be applied to the monitoring unit 62 or the first associative word processor 410 or 430 of the issue tracker 64 in the fourth to eleventh embodiments When a command is input through any inputting device such as click, touch, or voice command input for any displayed keyword, a second associated word status part that can view the related word status of high relevance at a glance .

도 14는 본 발명의 구체적인 제14실시예에 따라 구현된 시스템 전체를 예시한 도면이다. 이는 상기 언급한 본 발명의 모든 구성 및 특징이 실질적으로 수집, 처리 및 시각화되어 표시되는 과정을 나타내고 있다.14 is a diagram illustrating an entire system implemented in accordance with a fourteenth embodiment of the present invention. This shows a process in which all the above-mentioned constitutions and features of the present invention are substantially collected, processed and visualized and displayed.

또한, 본 발명의 구체적인 제15실시예에 따라 구현된 미래위험 변화예측 분석 방법은 아래와 같다.Further, a future risk change prediction analysis method implemented according to a fifteenth embodiment of the present invention is as follows.

상기 제15실시예에 따른 미래위험 변화예측의 분석 방법은 빅데이터로부터 재난관련 데이터를 필터링하여 수집하는 제1-1단계, 상기 제1-1단계에서 수집된 데이터를 바탕으로 텍스트 마이닝(Text mining)을 수행하여 수치화된 분석결과를 도출하는 제2단계, 상기 데이터 분석부에서 분석된 분석결과를 저장하는 제3단계, 상기 저장된 분석결과를 시각화하여 나타내는 제4단계를 포함하되, 상기 빅데이터는 국내외 언론사의 뉴스 및 국내외 재난 관련 학회에서 제공하는 DB인 것을 특징으로 할 수 있다.The method for analyzing future risk change prediction according to the fifteenth embodiment includes a first step of filtering and collecting disaster related data from the big data, a step of extracting text mining based on the data collected in step 1-1, A third step of storing the analysis result analyzed by the data analysis unit, and a fourth step of visualizing the stored analysis result, wherein the big data includes at least one of And DB provided by national and international news agencies and disaster-related academic societies at home and abroad.

이때, 제15실시예에 따른 미래위험 변화예측의 분석 방법은 상기 제1-1단계의 필터링되어 수집된 데이터와, 제2 DB서버에 저장된 DB로부터 ETL(Extraction, Transformation, Loading)이 수행된 데이터를 통합하는 제1-2단계를 더 포함하되, 상기 제2단계는 상기 제1-2단계에서 통합된 데이터를 바탕으로 텍스트 마이닝(Text mining)을 수행하여 수치화된 분석결과를 도출하는 제2단계로 구성될 수도 있다.At this time, the method for analyzing future risk change prediction according to the fifteenth embodiment is characterized in that the filtered and collected data in the step 1-1 and the data obtained by performing ETL (Extraction, Transformation, Loading) from the DB stored in the second DB server The second step includes a second step of performing text mining based on the data integrated in the step 1-2 to derive a numerical analysis result, .

그리고 상기 제2 DB서버는 '내부데이터 DB' 및 '정부기관에서 공중에 제공하는 DB'를 저장하는 것을 특징으로 할 수 있다.The second DB server may store an 'internal data DB' and a 'DB' provided by a government agency to the public.

또한, 상기 제15실시예에 따른 텍스트 마이닝(Text mining)을 수행하여 수치화된 분석결과를 도출하는 제2단계는 상기 제1-1단계 또는 상기 제1-2단계에서 수집된 데이터에 대하여 자연어 처리(NLP, natural language processing)를 수행하는 제2-1단계, 상기 제2-1단계의 자연어 처리된 데이터에 대하여 카테고리별로 문서 분류(document classification)를 수행하는 제2-2단계, 상기 제2-2단계로부터 분류된 데이터들에 대하여 문서 군집(document clustering)을 수행하는 제2-3단계 및 상기 제2-3단계로부터 군집된 문서의 정보를 추출하는 제2-4단계로 구성되는 것이 바람직하다.In addition, the second step of performing the text mining according to the fifteenth embodiment to derive the numerical analysis result may include a step of performing natural language processing on the data collected in the step 1-1 or 1-2, A second step of performing NLP (natural language processing), a second step of performing document classification on the natural-language processed data of the second- A step 2 - 3 of performing document clustering on the data classified from the second step and a step 2 - 4 of extracting the information of the clusters from the step 2 - 3 .

또한, 상기 제2단계는 상기 제2-4단계로부터 추출된 정보를 분석하여 전년대비 증가 추이를 기준으로 재난 유형별로 수치화하는 제2-5단계, 상기 제2-5단계로부터 수치화된 분석결과를 가지고 특정 이슈의 확산 경로를 추적하는 제2-6단계를 더 포함하는 것을 특징으로 할 수 있다.In the second step, the information extracted from the second-stage is analyzed, and then the information is digitized according to the type of the disaster based on the year-on-year increase trend. And a step 2-6 of tracking the spread path of the specific issue with the first node.

또한, 상기 제1-1단계에서 빅데이터를 필터링하는데 사용되는 재난 관련 키워드 및 상기 제2-2단계에서 문서 분류(document classification)에 사용되는 재난 관련 카테고리는, 상기 제1실시예 내지 제14실시예에서 설명한 미래위험 변화예측 분석 시스템의 설명과 같으므로, 이하 생략한다. In addition, the disaster related keyword used for filtering the big data in the step 1-1 and the disaster related category used in the document classification in the step 2-2 are the same as those in the first to fourteenth embodiments It is the same as the description of the future risk change prediction analysis system explained in the example, and will be omitted below.

또한, 그 밖의 제15실시예의 구체적인 특징은, 상기 제1실시예 내지 제14실시예에서 설명한 미래위험 변화예측 분석 시스템의 특징과 같다. 따라서 제15실시예의 구체적인 특징은 통상의 기술자 입장에서 쉽게 이해될 수 있으므로, 이하 자세한 설명은 생략한다.The specific features of the other fifteenth embodiments are the same as those of the future risk change prediction analysis system described in the first to fourteenth embodiments. Therefore, specific features of the fifteenth embodiment can be easily understood by those skilled in the art, and a detailed description thereof will be omitted.

10 : 외부데이터 수집원 20 : 데이터 수집부
21 : 제1 DB서버 22 : 중계서버
23 : 제2 DB서버 30 : 데이터 전처리부
40 : 데이터 분석부 41 : 정제부
42 : 분류부 43 : DB저장부
44 : 제n분석부 441 : 제1분석부
442 : 제2분석부 443 : 제3분석부
444 : 제4분석부 50 : 분석DB
60 : 표시부 61 : 스캐닝부
62 : 모니터링부 63 : 비교분석부
64 : 이슈추적부 65 : 논문검색부
100 : 통계부
200, 110, 120, 130, 140, 150 : 입력부
210, 250, 400 : 트렌드부 220 : 제1 트렌드 비교부
230 : 이슈 확산 형태 추적부 240 : 검색결과 표시부
300 : 도입부 310, 350 : 재난 토픽 및 뉴스부
320 : 제2 트렌드 비교부 330 : 이슈 뉴스부
410, 430, 450 : 제1 연관어 현황부 500 : 키워드부10: External data collection source 20: Data collection unit
21: first DB server 22: relay server
23: second DB server 30: data preprocessing section
40: data analysis unit 41:
42: Classification unit 43: DB storage unit
44: n-th analysis section 441: first analysis section
442: second analyzing unit 443: third analyzing unit
444: fourth analysis section 50: analysis DB
60: Display section 61: Scanning section
62: Monitoring section 63: Comparative analysis section
64: issue tracking unit 65:
100: Statistical Department
200, 110, 120, 130, 140, 150:
210, 250, 400: Trend section 220: First trend comparison section
230: Issue diffusion form tracking unit 240: Search result display unit
300: introduction part 310, 350: disaster topic and news part
320: second trend comparison unit 330: issue news department
410, 430, 450: First association word status part 500: Keyword part

Claims

In a system for analyzing future risk change forecasts,
A data collecting unit including a relay server for filtering the big data with disaster related keywords, and a first DB server for storing the disaster related keywords;
A data analyzer for performing text mining based on the collected data to derive a numerical analysis result;
An analysis DB for storing analysis results analyzed by the data analysis unit;
And a display unit for visualizing and displaying the stored analysis result,
Wherein the big data is DB provided by domestic and foreign media news and international and domestic disaster related institutes.

The method according to claim 1,
The data collecting unit
An internal data DB, and a second DB server for storing a DB provided to the public by a government agency.

3. The method of claim 2,
Further comprising a data preprocessing unit for performing an ETL (Extraction, Transformation, Loading) on an internal data DB stored in the second DB server and a DB provided to the public by a government agency, and integrating the extracted data with the external data filtered by the relay server A future risk change prediction analysis system.

The method according to claim 1,
Wherein the relay server comprises a collection adapter for collecting data by filtering from disaster related keywords from external news, and an external file server for storing the filtered data.

The method according to claim 1,
The disaster-related keywords include, but are not limited to, heavy rain storms, typhoons, floods, strong winds, yellow dust storms, landslides, heat waves, cold waves, tsunamis, earthquakes, droughts, heavy snow, lightning, hail, volcanic eruptions, , Radio wave disaster, information communication, NBC accident, water, traffic, energy, health care, traffic accident, explosion, terrorism, war, fire, marine environment pollution accident, water quality environmental pollution accident, air accident, Infrastructure, collapse, and synonyms and synonyms of the keywords. &Lt; Desc / Clms Page number 19 >

The method according to claim 1,
The data analysis unit may include:
A refining unit for performing natural language processing (NLP) on the data collected by the data collecting unit;
A classifying unit for classifying the refined data according to a category;
A DB storage unit storing disaster related category data for the classification;
A first analyzing unit for performing document clustering on a document classified by the category;
A second analyzing unit for extracting information of a document collected by the first analyzing unit;
Wherein the predicted future risk change analysis system comprises:

The method of claim 3,
The data analysis unit may include:
A refining unit for performing natural language processing (NLP) on the integrated data integrated by the data preprocessing unit;
A classifying unit for classifying the refined data according to a category;
A DB storage unit storing disaster related category data for the classification;
A first analyzing unit for performing document clustering on a document classified by the category;
A second analyzing unit for extracting information of a document collected by the first analyzing unit;
Wherein the predicted future risk change analysis system comprises:

8. The method according to claim 6 or 7,
The data analysis unit
And analyzing the information extracted by the second analyzing unit and converting the analyzed information to a disaster type based on a year-on-year increase trend.

9. The method of claim 8,
The data analysis unit
And a fourth analyzing unit for tracking the diffusion path of the specific issue with the analysis result quantified by the third analyzing unit.

10. The method of claim 9,
The diffusion path tracing of the specific issue may include:
A future risk change prediction analysis system characterized by using the Sørensen-Dice coefficient algorithm.

8. The method according to claim 6 or 7,
Wherein,
Wherein the document classification is performed by classifying the classified data into news and articles based on the source of the refined data and considering synonyms and synonyms based on data on the disaster related category, system.

8. The method according to claim 6 or 7,
The disaster-related category data includes:
A disaster type, a social environment, and a damage attribution category.

13. The method of claim 12,
The disaster type category
Hurricane, typhoon, flood, strong wind, yellow sand, storm, landslide, heat wave, cold wave, tsunami, earthquake, drought, heavy snow, lightning, hail, volcanic eruption, space disaster, bird, livestock disease, financial computing, epidemic It is the responsibility of the national government to ensure that communications, NBC accidents, water, traffic, energy, health care, traffic accidents, explosions, terrorism, war, fire, marine environmental pollution accidents, water quality environmental pollution accidents, The synonyms, and the synonyms.

13. The method of claim 12,
The social environment category
Wherein said system is subdivided into agriculture, fishery, forestry, animal husbandry, energy, transportation, health, hygiene, water resources, security and its synonyms and synonyms.

13. The method of claim 12,
The damage attribute category
The damage of the animal, the damage of the property, the damage of the property, the damage of the facility, the synonym thereof, and the synonyms.

The method according to claim 1,
The display unit
The analysis result stored in the analysis DB is subjected to ETL (Extraction, Transformation, Loading), visualized,
And a scanning unit configured to visualize a numerical statistic of the cumulative news volume or the total cumulative news volume related to the disaster in each of the plurality of disaster related categories in the classifying unit.

The method according to claim 1,
The display unit
The analysis result stored in the analysis DB is subjected to ETL (Extraction, Transformation, Loading), visualized,
A document group in which document clustering is performed in the first analyzing unit, and a monitoring unit visualizing core keywords extracted from the second analyzing unit.

The method according to claim 1,
The display unit
The analysis result stored in the analysis DB is subjected to ETL (Extraction, Transformation, Loading), visualized,
And a comparison and analysis unit that visualizes the quantitative data converted by the third analysis unit on a screen so as to be able to compare them by theme and time.

The method according to claim 1,
The display unit
The analysis result stored in the analysis DB is subjected to ETL (Extraction, Transformation, Loading), visualized,
And an issue tracking unit that visualizes the issue intensity quantified by the fourth analyzing unit by timing.

The method according to claim 1,
The display unit
And a thesis searching unit having an input unit for directly searching the thesis of the external data collected in the data collecting unit and visualizing the search result.

The method of claim 16, wherein
A statistical part for displaying a numerical statistic of a cumulative news volume or a total cumulative news volume for each disaster related category in the classifying section;
An input unit for specifying a national area and a month to be displayed;
The ratio of the 'cumulative news volume per category of natural disaster, social disaster, social environment and damage property' of the classified news volume subdivided according to the disaster related category into a graphical representation of the information input to the input section, A downloading unit for downloading,
And a trending unit for describing the quantitative data converted by the third analyzing unit in accordance with the information input to the input unit together with the key keywords of the converted quantitative data,
And a keyword unit configured to visualize the key keywords extracted by the second analysis unit according to the information inputted to the input unit in a color different according to the frequency of use ranking.

18. The method of claim 17,
The monitoring unit may include an input unit for inputting a national area, a natural disaster to be displayed, a social disaster, a social environment, a type of damage property, a date to display, a month to display,
And a trending unit for describing the quantitative data converted by the third analyzing unit in accordance with the information input to the input unit together with the key keywords of the converted quantitative data,
A document group in which document clustering is performed in the first analyzing unit according to information input to the input unit is displayed as a topic and a common keyword so that the document group can be downloaded, An emergency topic and a news section,
The main keyword extracted by the second analyzing unit is divided into a central keyword and a related keyword according to the degree of relevance in accordance with the information input to the input unit. And a first associative language status part for indicating that the user is able to view the content.

19. The method of claim 18,
The comparison and analysis unit may include a plurality of input units for inputting a national area, a natural disaster to be displayed, a social disaster, a social environment, a kind of damage attribute, a reference date, a month to be displayed, ,
And a plurality of trending units, wherein the plurality of trending units include a plurality of trending units, each of which includes a plurality of trending units, and a plurality of trending units, And a comparator for displaying the same on a chart so as to be compared at a glance.

20. The method of claim 19,
The issue tracking unit includes an input unit for inputting a national area, a natural disaster to be displayed, a social disaster, a social environment, a type of damage attribute and a display date, a month to be displayed,
An issue diffusion shape tracing unit for visualizing a range in which a specific issue is diffused by plotting an issue occurrence date on a horizontal axis and an issue strength on a vertical axis in accordance with information input to the input unit;
An issue news section for visualizing a document group in which document clustering has been performed in the first analysis unit with respect to a specific issue in accordance with information input to the input unit;
The main keyword extracted by the second analyzing unit is divided into a central keyword and a related keyword according to the degree of relevance in accordance with the information input to the input unit. And a first associative language status part for indicating that the user is able to view the content.

25. The method according to claim 22 or 24,
When a command is input to any one of the news list displayed on the news section of the news section or the news section of the news item of the paragraph 22,
And a list of relevant news articles in which the document clustering has been performed in the first analysis unit in association with the news is displayed in a new window.

25. The method according to claim 22 or 24,
When a command is input to any of the keywords in the associated word status unit,
And a second associative language status unit for allowing the user to view at a glance the associative language status having a high relevance around only the keyword.

In the analysis of future risk change forecasting,
A 1-1 step of filtering and collecting disaster related data from the big data;
A second step of performing text mining based on the data collected in the step 1-1 to derive a numerical analysis result;
A third step of storing the analysis result derived from the data analysis unit;
And a fourth step of visualizing and displaying the stored analysis result,
Wherein the big data is DB provided by domestic and overseas media news and international and domestic disaster related institutes.

28. The method of claim 27,
The method may further include a first step of integrating the filtered and collected data of the first stage and data obtained by performing ETL (Extraction, Transformation, Loading) from the DB stored in the second DB server,
The second step comprises:
And performing text mining based on the integrated data in the step 1-2 to derive a numerical analysis result.

29. The method of claim 27 or 28,
The second step
A second step of performing natural language processing (NLP) on the data collected in the step 1-1 or the data integrated in the step 1-2;
A second step (2-2) of performing document classification on the natural language processed data of the second stage;
A step 2-3 of performing document clustering on the data classified from the step 2-2;
And (2-4) extracting the information of the clusters from the step (2-3).

30. The method of claim 29,
The second step
And analyzing the information extracted from the step 2-4, and quantifying the information according to the disaster type on the basis of the increase rate with respect to the year-on-year.

31. The method of claim 30,
The second step
Further comprising a second 2-6 step of tracking a diffusion path of a specific issue with the analysis result quantified from the step 2-5.