KR102556463B1

KR102556463B1 - Social advanced persistent threat prediction system and method based on attacker group similarity

Info

Publication number: KR102556463B1
Application number: KR1020210158637A
Authority: KR
Inventors: 남기효; 김윤홍; 신희성
Original assignee: (주)유엠로직스
Priority date: 2021-11-17
Filing date: 2021-11-17
Publication date: 2023-07-18
Also published as: KR20230072171A

Abstract

본 발명은 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템 및 그 방법에 관한 것으로서, 과거에 사이버 표적공격이 발생된 시점의 사회이슈와 공격자 그룹 특성을 분석하여, 아직 발생하지 않은 사회이슈 기반 사이버 표적공격을 예측할 수 있는 기술에 관한 것이다.The present invention relates to a social issue-based cyber target attack prediction system and method using an attacker group similarity technique. It is about technology that can predict issue-based cyber targeted attacks.

Description

Social advanced persistent threat prediction system and method based on attacker group similarity}

본 발명은 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템 및 그 방법에 관한 것으로, 더욱 상세하게는 아직 발생하지 않은 사회이슈 기반 사이버 표적공격의 발생을 예측할 수 있는 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템 및 그 방법에 관한 것이다.The present invention relates to a social issue-based cyber target attack prediction system and method using an attacker group similarity technique, and more particularly, to an attacker group similarity technique that can predict the occurrence of a social issue-based cyber target attack that has not yet occurred. It is about the social issue-based cyber target attack prediction system and method.

사이버 표적공격이란, 특정 실체를 목표로 하는 사람들에 의해 잠행적이고 지속적인 컴퓨터 해킹 프로세스들의 집합에 의해 공격이 이루어지며, 보통 개인, 단체, 국가 또는, 사업체나 정치단체(이들의 운영서버 등)를 그 표적으로 삼는다.A cyber targeted attack is an attack made by a set of insidious and continuous computer hacking processes by people targeting a specific entity, usually an individual, organization, country, or business or political organization (their operating server, etc.) make a target

이러한 사이버 표적공격은 오랜 시간 동안 상당한 정도의 은밀함이 요구되어, 표적으로 삼고 있는 운영서버(또는, 운영 시스템) 내의 취약점을 공격하기 위한 악성 소프트웨어를 이용하며, 이러한 악성 소프트웨어를 생성하기 위해 외부에서 지속적으로 표적 대상들에 대한 데이터를 감시하고 추출하게 된다.These cyber-targeted attacks require a considerable degree of stealth for a long time, use malicious software to attack vulnerabilities in the operating server (or operating system) being targeted, and continuously attack from outside to create such malicious software. It monitors and extracts data on target objects.

이러한 사이버 표적공격 중 사회이슈를 기반으로 한 사이버 표적공격(SAPT, Social Advanced Persistent Threat)은 공격자 조직이 특정 사회적 이슈 등을 빌미로 이와 연관되어 있는 여러 개의 공격 대상에 대해 전략적으로 벌이는 사이버 표적공격을 의미한다. 이러한 사회이슈 기반 사이버 표적공격은 다수의 공격 대상에 대해 연쇄적이면서도 동시 다발적으로 또는, 연속적으로 사이버 공격을 시도함으로써, 사회적으로도 많은 피해를 야기하게 된다. Among these cyber targeted attacks, SAPT (Social Advanced Persistent Threat) based on social issues is a cyber targeted attack that an attacker organization strategically launches against multiple attack targets related to it under the pretext of a specific social issue. it means. These social issue-based cyber-targeted attacks cause a lot of social damage by serially, concurrently, or consecutively attempting cyber-attacks against multiple attack targets.

시만텍의 발표에 따르면, 국내 사이버 표적공격에 평창 올림픽 사이버 공격, 국방, 하이테크, 금융업계 등 2017년도에만 약 140여개의 공격자 그룹으로부터 세계 6번째로 많은 사이버 표적공격을 받았다. 과거의 사이버 표적공격은 기밀 데이터, 주요 자산 정보 및 개인정보 등 조직 내부의 중요한 정보를 갈취하는 것이지만, 현대의 사이버 표적공격은 중요정보를 갈취할 뿐 아니라, 경제적인 이익을 얻기 위해 공격을 감행하고 있다. 또한, 공격자는 기업 및 기관 내 시스템에 마비, 파괴 또는, 협박에 이르기까지 목적과 수단이 더욱 지능화되고 고도화 되어 지고 있다.According to Symantec's announcement, it received the 6th largest number of cyber-targeted attacks in the world from about 140 attacker groups in 2017 alone, including domestic cyber-targeted attacks, PyeongChang Olympic cyber-attacks, defense, high-tech, and financial industries. Cyber targeted attacks in the past extorted important information within an organization, such as confidential data, key asset information, and personal information. However, modern cyber targeted attacks not only extort important information, but also carry out attacks to obtain economic benefits. there is. In addition, the purpose and means of attackers are becoming more intelligent and advanced, ranging from paralysis, destruction, or intimidation to systems within companies and institutions.

종래의 사이버 표적공격은 국가 차원에서 위협이 될 수 있는 요소로 판단되어 이에 따른 투자가 확대되고 있으나, 사회이슈 기반 사이버 표적공격은 단일 기관이 아닌 다중 기관(공격 대상)에서 대응이 이루어져야 하기 때문에, 기존 보안 솔루션과 보안 프레임워크로는 한계가 분명히 존재한다.Conventional cyber targeted attacks are judged to be a threat at the national level, and investment is expanding accordingly. Existing security solutions and security frameworks clearly have limitations.

즉, 각각의 서버(기관 등)에서 개별적으로 보안 기법을 적용하고 있어, 동시 다발적으로, 그리고 연쇄적으로 발생하는 사회이슈 기반 사이버 표적공격이 발생하게 되면, 이를 하나의 공격 기조로 인식하지 못하고, 개별적안 사이버 표적공격으로만 판단하여, 각각의 서버에서 개별적으로 대응하기 때문에, 사회이슈 기반 사이버 표적공격에 대한 탐지가 늦어지는 문제점이 있다.In other words, each server (institution, etc.) individually applies security techniques, so when social issue-based cyber targeted attacks occur simultaneously and in series, it is not recognized as a single attack trend. , there is a problem in that detection of social issue-based cyber targeted attacks is delayed because each server individually responds only to individual cyber targeted attacks.

뿐만 아니라, 현재 시점에서 사이버 표적공격이 발생함을 탐지하고, 이에 신속하게 대응하여 2차 피해를 막기 위한 기술이기 때문에, 사이버 표적공격, 특히, 사회이슈 기반 사이버 표적공격의 발생 자체를 예측하지 못하는 문제점이 있다.In addition, since it is a technology for detecting the occurrence of a cyber targeted attack at the present time and promptly responding to it to prevent secondary damage, it is difficult to predict the occurrence of a cyber targeted attack, especially a cyber targeted attack based on social issues. There is a problem.

건국대학교 석사학위논문 사회이슈기반 사이버 표적공격 탐지를 위한 Rick Hash 기반 악성코드 분석에 관한 연구(2021.08.)Konkuk University Master's Thesis Research on Rick Hash-based Malicious Code Analysis for Social Issue-based Cyber Target Attack Detection (2021.08.)

본 발명은 상기한 바와 같은 종래 기술의 문제점을 해결하기 위하여 안출된 것으로, 본 발명의 목적은 아직 발생하지 않은 사회이슈 기반 사이버 표적공격의 발생 여부를 사전에 예측할 수 있으며, 비교적 정확하게 발생 시점까지 예측할 수 있는 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템 및 그 방법을 제공하는 것이다.The present invention has been devised to solve the problems of the prior art as described above, and an object of the present invention is to predict in advance whether a cyber target attack based on a social issue that has not yet occurred will occur, and relatively accurately predict until the time of occurrence. The purpose of this study is to provide a social issue-based cyber target attack prediction system and method using an attacker group similarity technique.

본 발명의 일 실시예에 따른 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템에 있어서, 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템을 적용하고자 하는 다수의 기관의 운영서버로부터 보안 관련 운영 데이터를 각각 입력받는 데이터 수집부(100), 상기 데이터 수집부(100)에 의한 각 보안 관련 운영 데이터를 인공지능 학습을 위한 학습 데이터 셋으로 생성하고, 다수의 인공지능 알고리즘을 이용하여 상기 학습 데이터 셋에 대한 학습 처리를 수행하는 학습 처리부(200), 상기 다수의 기관의 운영서버 중 적어도 어느 하나로부터 실시간 운영 데이터를 입력받는 데이터 입력부(300), 상기 학습 처리부(200)에 의해 생성한 다수의 인공지능 모델을 적용한 앙상블 기법을 이용하여, 상기 데이터 입력부(300)에 의한 상기 실시간 운영 데이터를 입력하여, 상기 실시간 운영 데이터에 대한 사회이슈 기반 사이버 표적공격의 발생 위험도를 산출하는 위험도 분석부(400) 및 저장되는 공격자 그룹별 공격 특성 DB를 이용하여, 상기 위험도 분석부(400)에서 분석한 사회이슈 기반 사이버 표적공격의 발생 시점을 예측하는 예측 분석부(500)를 포함하는 것이 바람직하다.In the social issue-based cyber target attack prediction system using the attacker group similarity method according to an embodiment of the present invention, the operation of multiple organizations that want to apply the social issue based cyber target attack prediction system using the attacker group similarity method The data collection unit 100 that receives security-related operation data from the server, generates each security-related operation data by the data collection unit 100 as a learning data set for artificial intelligence learning, and uses a plurality of artificial intelligence algorithms. To the learning processing unit 200 that performs learning processing on the learning data set using the data input unit 300 that receives real-time operation data from at least one of the operation servers of the plurality of institutions, and the learning processing unit 200 By using an ensemble technique applying a plurality of artificial intelligence models generated by, inputting the real-time operation data by the data input unit 300 to calculate the risk of occurrence of a cyber target attack based on social issues for the real-time operation data A predictive analysis unit 500 that predicts the time of occurrence of a cyber target attack based on the social issue analyzed by the risk analysis unit 400 using the risk analysis unit 400 and the stored attack characteristic DB for each attacker group it is desirable

더 나아가, 상기 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템은 과거 발생한 다수의 사이버 표적공격 관련 데이터를 수집하는 제1 수집부(600), 과거 각각의 사이버 표적공격이 발생한 시점을 기준으로 소정기간 동안의 사회이슈 관련 데이터를 수집하는 제2 수집부(700), 상기 제1 수집부(600)와 제2 수집부(700)에 의한 수집 데이터 간의 유사도를 분석하는 유사도 분석부(800) 및 상기 유사도 분석부(800)에서 분석한 유사도를 기준으로 과거 사이버 표적공격을 일으킨 공격자를 그룹화하고, 그룹 별로 상기 수집 데이터를 매칭시켜 데이터베이스화하며, 이를 상기 예측 분석부(500)로 전송 및 저장하는 공격자 분석부(900)를 더 포함하는 것이 바람직하다.Furthermore, the social issue-based cyber target attack prediction system using the attacker group similarity technique includes a first collection unit 600 that collects data related to multiple cyber target attacks that have occurred in the past, and the time when each cyber target attack occurred in the past. A second collection unit 700 that collects data related to social issues for a predetermined period of time as a standard, and a similarity analysis unit that analyzes the degree of similarity between the data collected by the first collection unit 600 and the second collection unit 700 ( 800) and the similarity analyzed by the similarity analysis unit 800, group attackers who have caused cyber target attacks in the past, match the collected data for each group, create a database, and transmit it to the prediction analysis unit 500 And it is preferable to further include an attacker analysis unit 900 for storing.

더 나아가, 상기 예측 분석부(500)는 상기 위험도 분석부(400)에서 산출한 발생 위험도가 해당하는 운영서버의 기설정된 위험도 임계치를 초과할 경우, 예측 분석 동작을 수행하는 것이 바람직하다.Furthermore, the prediction analysis unit 500 preferably performs a prediction analysis operation when the risk of occurrence calculated by the risk analysis unit 400 exceeds a preset risk threshold of the corresponding operation server.

더 나아가, 상기 학습 처리부(200)는 다수의 인공지능 알고리즘을 이용하여, 생성한 상기 학습 데이터 셋을 각각 적용하여 병렬적 학습 처리를 수행하는 것이 바람직하다.Furthermore, it is preferable that the learning processing unit 200 performs parallel learning processing by applying each of the generated learning data sets using a plurality of artificial intelligence algorithms.

더 나아가, 상기 위험도 분석부(400)는 다수의 인공지능 모델을 적용한 앙상블 기법을 이용하여, 다수의 인공지능 모델에 상기 실시간 운영 데이터를 각각 입력하고, 다수의 인공지능 모델로부터 상기 실시간 운영 데이터에 대한 각각의 분석 결과를 비교 판단하여, 해당하는 운영서버에 대한 사회이슈 기반 사이버 표적공격의 발생 위험도를 산출하는 것이 바람직하다.Furthermore, the risk analysis unit 400 inputs the real-time operation data to a plurality of artificial intelligence models, respectively, using an ensemble technique to which a plurality of artificial intelligence models are applied, and converts the real-time operation data from the plurality of artificial intelligence models. It is desirable to calculate the risk of occurrence of a cyber target attack based on social issues for the corresponding operation server by comparing and judging each analysis result.

더 나아가, 상기 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템은 상기 위험도 분석부(400)에서 산출한 발생 위험도 또는, 예측 분석부(500)에서 예측한 발생 시점을 이용하여, 해당하는 운영서버에 매칭되는 사전 대응 조치 정보를 생성하는 후속 처리부(1000)를 더 포함하는 것이 바람직하다.Furthermore, the social issue-based cyber target attack prediction system using the attacker group similarity technique uses the risk of occurrence calculated by the risk analysis unit 400 or the time of occurrence predicted by the prediction analysis unit 500, It is preferable to further include a follow-up processing unit 1000 for generating proactive action information matched to the operating server.

본 발명의 또 다른 일 실시예에 따른 컴퓨터로 구현되는 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템에 의해 각 단계가 수행되는 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 방법에 있어서, 데이터 수집부에서, 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템을 적용하고자 하는 다수의 기관의 운영서버로부터 보안 관련 운영 데이터를 각각 입력받는 운영 데이터 입력 단계(S100), 학습 처리부에서, 상기 운영 데이터 입력 단계(S100)에 의해 입력받은 상기 보안 관련 운영 데이터를 인공지능 학습을 위한 학습 데이터 셋으로 생성하고, 다수의 인공지능 알고리즘을 이용하여 상기 학습 데이터 셋에 대한 학습 처리를 수행하는 학습 처리 단계(S200), 데이터 입력부에서, 상기 다수의 기관의 운영서버 중 적어도 어느 하나로부터 실시간 운영 데이터를 입력받는 실시간 데이터 입력 단계(S300), 위험도 분석부에서, 상기 학습 처리 단계(S200)에 의해 생성한 다수의 인공지능 모델을 적용한 앙상블 기법을 이용하여, 상기 실시간 데이터 입력 단계(S300)에 의한 상기 실시간 운영 데이터를 입력하여, 상기 실시간 운영 데이터에 대한 사회이슈 기반 사이버 표적공격의 발생 위험도를 산출하는 위험도 산출 단계(S400), 예측 분석부에서, 저장되는 공격자 그룹별 공격 특성 DB를 이용하여, 상기 위험도 산출 단계(S400)에 의해 분석한 사회이슈 기반 사이버 표적공격의 발생 시점을 예측하는 예측 분석 단계(S500) 및 후속 처리부에서, 상기 위험도 산출 단계(S400) 또는, 상기 예측 분석 단계(S500)의 수행 결과를 이용하여, 해당하는 운영서버에 매칭되는 사전 대응 조치 정보를 생성 및 전송하는 대응 단계(S600)를 포함하는 것이 바람직하다.Prediction of social issue-based cyber target attack using the attacker group similarity method in which each step is performed by the social issue-based cyber target attack prediction system using the computer-implemented attacker group similarity method according to another embodiment of the present invention In the method, in the data collection unit, an operation data input step of receiving security-related operation data from operation servers of multiple organizations to which a social issue-based cyber target attack prediction system using an attacker group similarity technique is applied (S100) , In the learning processing unit, the security-related operation data input by the operation data input step (S100) is generated as a learning data set for artificial intelligence learning, and learning of the learning data set using a plurality of artificial intelligence algorithms A learning processing step (S200) of performing processing, a real-time data input step (S300) of receiving real-time operating data from at least one of the operation servers of the plurality of institutions in the data input unit, and the learning processing step in the risk analysis unit. Using the ensemble technique to which a plurality of artificial intelligence models generated in (S200) are applied, the real-time operation data is input in the real-time data input step (S300), and social issue-based cyber target attack on the real-time operation data The risk calculation step (S400) of calculating the risk of occurrence of the occurrence of the social issue-based cyber target attack analyzed by the risk calculation step (S400) using the attack characteristic DB for each attacker group stored in the prediction analysis unit In the predictive analysis step (S500) of predicting and subsequent processing unit, using the results of the risk calculation step (S400) or the predictive analysis step (S500), proactive action information matched to the corresponding operation server is generated. and a corresponding step of transmitting (S600).

더 나아가, 상기 학습 처리 단계(S200)는 다수의 인공지능 알고리즘을 이용하여, 생성한 상기 학습 데이터 셋을 각각 적용하여 병렬적 학습 처리를 수행하는 것이 바람직하다.Furthermore, in the learning processing step (S200), it is preferable to perform parallel learning processing by applying each of the generated learning data sets using a plurality of artificial intelligence algorithms.

더 나아가, 상기 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 방법은 상기 위험도 산출 단계(S400)에 의해 산출한 사회이슈 기반 사이버 표적공격의 발생 위험도가 해당하는 운영서버의 기설정된 위험도 임계치를 초과할 경우, 상기 예측 분석 단계(S500)를 수행하는 것이 바람직하다.Furthermore, in the method of predicting social issue-based cyber target attacks using the attacker group similarity technique, the risk of occurrence of social issue-based cyber target attacks calculated in the risk calculation step (S400) corresponds to the preset risk threshold of the operating server. If it exceeds , it is preferable to perform the predictive analysis step (S500).

더 나아가, 상기 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 방법은 제1 수집부에서, 과거 발생한 다수의 사이버 표적공격 관련 데이터를 수집하고, 제2 수집부에서, 과거 각각의 사이버 표적공격이 발생한 시점을 기준으로 소정기간 동안의 사회이슈 관련 데이터를 수집하는 과거 데이터 수집 단계(S510), 유사도 분석부에서, 상기 과거 데이터 수집 단계(S510)에 의한 수집 데이터들 간의 유사도를 분석하는 유사도 분석 단계(S520) 및 공격자 분석부에서, 상기 유사도 분석 단계(S520)에 의해 분석한 유사도를 기준으로 과거 사이버 표적공격을 일으킨 공격자를 그룹화하고, 그룹 별로 상기 과거 데이터 수집 단계(S510)에 의한 수집 데이터를 매칭시켜 데이터베이스화하는 공격자 분석 단계(S530)를 더 포함하고, 상기 예측 분석 단계(S500)는 상기 공격자 분석 단계(S530)에 의해 데이터베이스화한 데이터를 상기 공격자 그룹별 공격 특성 DB로 이용하는 것이 바람직하다.Furthermore, in the social issue-based cyber target attack prediction method using the attacker group similarity technique, the first collection unit collects data related to multiple cyber target attacks that have occurred in the past, and the second collection unit collects data related to each cyber target in the past. A past data collection step (S510) of collecting social issue-related data for a predetermined period of time based on the time of the attack, and a similarity analysis unit analyzing the similarity between the data collected by the past data collection step (S510). In the analysis step (S520) and the attacker analysis unit, based on the similarity analyzed by the similarity analysis step (S520), attackers who have caused cyber target attacks in the past are grouped, and each group is collected by the past data collection step (S510). Further comprising an attacker analysis step (S530) of matching data and converting the data into a database, and the predictive analysis step (S500) uses the data databased by the attacker analysis step (S530) as an attack characteristic DB for each attacker group. desirable.

상기와 같은 구성에 의한 본 발명의 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템 및 그 방법은 종래의 사이버 표적공격에 대한 대응 기술(최초 발생을 탐지 후, 이에 대한 방어)의 한계를 극복하기 위하여, 과거 발생한 사이버 표적공격 관련 데이터를 사회이슈 관련 데이터와 연계하여 수집하고, 이들의 유사성을 분석하여 공격자 그룹과 이들의 공격 특성을 DB화하여 활용함으로써, 아직 발생하지 않은 사회이슈 기반 사이버 표적공격의 발생을 비교적 정확히 예측할 수 있는 장점이 있다.The social issue-based cyber target attack prediction system and method using the attacker group similarity technique of the present invention according to the above configuration have limitations in the conventional countermeasure technology (detection of the first occurrence and then defense) against the cyber target attack. In order to overcome this, data related to cyber target attacks that have occurred in the past are collected in connection with data related to social issues, and their similarities are analyzed to form a database of attacker groups and their attack characteristics, thereby creating a database based on social issues that have not yet occurred. It has the advantage of being able to relatively accurately predict the occurrence of cyber targeted attacks.

특히, 다수의 인공지능 모델을 적용한 앙상블 기법을 채용하여, 칼만필터 알고리즘 등의 별도의 예측 알고리즘 없이도 높은 정확도/신뢰도를 갖는 예측 결과를 제공할 수 있다.In particular, by employing an ensemble technique applying a plurality of artificial intelligence models, prediction results having high accuracy/reliability can be provided without a separate prediction algorithm such as a Kalman filter algorithm.

이를 통해서, 특정 사회이슈 등을 빌미로 다수의 공격 대상에 전략적으로 수행하는 '사회이슈 기반 사이버 표적공격'의 위험도와 파급효과를 고려하여, 보다 능동적으로/적극적으로 정확한 사전 공격 가능성을 예측할 수 있는 장점이 있다.Through this, considering the risk and ripple effect of 'social issue-based cyber target attacks' strategically carried out on multiple attack targets under the pretext of specific social issues, etc., it is possible to more actively/actively predict the possibility of accurate advance attacks There are advantages.

도 1은 본 발명의 일 실시예에 따른 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템을 나타낸 구성 예시도이다.
도 2는 본 발명의 일 실시예에 따른 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 방법을 나타낸 순서 예시도이다.1 is an exemplary configuration diagram illustrating a social issue-based cyber target attack prediction system using an attacker group similarity technique according to an embodiment of the present invention.
2 is a flowchart illustrating a method for predicting a cyber target attack based on social issues using an attacker group similarity technique according to an embodiment of the present invention.

이하 첨부한 도면들을 참조하여 본 발명의 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템 및 그 방법을 상세히 설명한다. 다음에 소개되는 도면들은 당업자에게 본 발명의 사상이 충분히 전달될 수 있도록 하기 위해 예로서 제공되는 것이다. 따라서, 본 발명은 이하 제시되는 도면들에 한정되지 않고 다른 형태로 구체화될 수도 있다. 또한, 명세서 전반에 걸쳐서 동일한 참조번호들은 동일한 구성요소들을 나타낸다.Hereinafter, a social issue-based cyber target attack prediction system and method using the attacker group similarity technique of the present invention will be described in detail with reference to the accompanying drawings. The drawings introduced below are provided as examples to sufficiently convey the spirit of the present invention to those skilled in the art. Accordingly, the present invention may be embodied in other forms without being limited to the drawings presented below. Also, like reference numerals denote like elements throughout the specification.

이때, 사용되는 기술 용어 및 과학 용어에 있어서 다른 정의가 없다면, 이 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 통상적으로 이해하고 있는 의미를 가지며, 하기의 설명 및 첨부 도면에서 본 발명의 요지를 불필요하게 흐릴 수 있는 공지 기능 및 구성에 대한 설명은 생략한다.At this time, unless there is another definition in the technical terms and scientific terms used, they have meanings commonly understood by those of ordinary skill in the art to which this invention belongs, and the gist of the present invention in the following description and accompanying drawings Descriptions of well-known functions and configurations that may be unnecessarily obscure are omitted.

더불어, 시스템은 필요한 기능을 수행하기 위하여 조직화되고 규칙적으로 상호 작용하는 장치, 기구 및 수단 등을 포함하는 구성 요소들의 집합을 의미한다.In addition, a system refers to a set of components including devices, mechanisms, and means that are organized and regularly interact to perform necessary functions.

종래의 사이버 표적공격 대응의 경우, 현재 시점에서 발생한 사이버 표적공격을 탐지하고, 이에 따른 2차 피해를 막기 위한 대응 방법으로 이용되고 있다. 즉, 미래에 발생할 사이버 표적공격은 예측 방어하지 못함은 물론이고, 발생 시점도 예측하지 못하는 문제점이 있다.In the case of a conventional cyber target attack response, it is used as a countermeasure method to detect a cyber target attack that has occurred at the present time and prevent secondary damage. In other words, there is a problem in that cyber target attacks that will occur in the future cannot be predicted and prevented, as well as the time of occurrence.

이는 이미 발생한 사이버 표적공격의 탐지로서, 해당 공격과 관련된 기관의 2차 피해를 막기 위한 대응 방법에 불과할 뿐, 최초로 발생한 사이버 표적공격에 대한 방어가 이루어지지 못하는 문제점이 있다.This is detection of a cyber-targeted attack that has already occurred, and is only a countermeasure to prevent secondary damage to organizations related to the attack.

발생하는 모든 사회이슈 기반 사이버 표적공격에 적용하는 것은 무리가 있지만, 통상적으로 최초의 사회이슈 기반 사이버 표적공격이 이루어진 후 이를 탐지하는 대응 방법을 적용함으로써, 발생하는 최초의 사회이슈 기반 사이버 표적공격으로 인한 피해 복구 비용이 사전에 사회이슈 기반 사이버 표적공격을 예측하여 대비하기 위해 발생하는 비용에 비해 큰 것이 당연하다.It is unreasonable to apply it to all social issue-based cyber targeted attacks that occur, but by applying a countermeasure method that detects it after the first social issue-based cyber targeted attack has occurred, it is the first social issue-based cyber targeted attack that occurs. It is natural that the cost of recovering the damage caused by this is higher than the cost incurred to predict and prepare for a cyber target attack based on social issues in advance.

본 발명의 일 실시예에 따른 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템 및 그 방법은, 상술한 문제점을 해소하기 위하여, 과거 사이버 표적공격이 발생한 시점의 사회이슈 데이터와 해당하는 공격자 그룹의 특성을 분석하여 이들 간의 유사도를 측정하고, 이를 활용하여 다수의 기관의 운영서버에서 발생되는 운영 데이터를 분석하여 사회이슈 기반 사이버 표적공격을 예측할 수 있는 기술에 관한 것이다.A social issue-based cyber target attack prediction system and method using an attacker group similarity technique according to an embodiment of the present invention, in order to solve the above-mentioned problems, social issue data at the time when a cyber target attack occurred in the past and corresponding It is about a technology that can predict social issue-based cyber target attacks by analyzing the characteristics of an attacker group, measuring the similarity between them, and utilizing this to analyze operational data generated from operation servers of multiple institutions.

즉, 과거 사이버 표적공격이 발생한 시점의 사회이슈, 공격자 그룹 특성을 분석한 후, 다수의 기관의 운영서버에서 발생되는 운영 데이터를 기반으로 분석 데이터와 운영 데이터 간의 유사도를 산출하여, 사회이슈 기반 사이버 표적공격을 예측할 수 있다.In other words, after analyzing social issues and attacker group characteristics at the time of past cyber targeted attacks, we calculate the similarity between the analysis data and operation data based on the operation data generated from the operation servers of multiple institutions, Target attacks can be predicted.

특히, 본 발명의 일 실시예에 따른 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템 및 그 방법은, 다수의 인공지능 알고리즘을 적용한 앙상블 기법을 이용함으로써, 칼만필터 알고리즘 등의 별도의 예측 알고리즘 없이도 사회이슈 기반 사이버 표적공격의 발생 가능성, 다시 말하자면, 사회이슈 기반 사이버 표적공격의 발생과 그 발생 시점을 예측할 수 있다.In particular, a social issue-based cyber target attack prediction system and method using an attacker group similarity technique according to an embodiment of the present invention uses an ensemble technique to which a plurality of artificial intelligence algorithms are applied. Even without a predictive algorithm, it is possible to predict the occurrence and timing of social issue-based cyber targeted attacks.

종래에도 인공지능 기법을 이용하여 사이버 표적공격을 방어하는 기술들이 개발되어 왔으나, 다양한 공격 대상 서버로부터 다양한 환경에 의해 생성되는 운영 데이터를 학습 데이터의 소스로 활용해야 하는 사회이슈 기반 사이버 표적공격에는 적합하지 않을 뿐 아니라, 단일 기계학습 모델 또는, 다수의 기계학습 모델을 이용하여 사회이슈 기반 사이버 표적공격에 대한 방어로 '공격 탐지'하는 것에 불과하여 예측 대응하는데 한계가 분명히 존재하였다.Conventionally, technologies to defend against cyber-targeted attacks using artificial intelligence techniques have been developed, but they are suitable for cyber-targeted attacks based on social issues that require operational data generated by various environments from various attacked servers as a source of learning data. Not only that, but it is only 'attack detection' as a defense against social issue-based cyber target attacks using a single machine learning model or multiple machine learning models, so there are clearly limitations in predicting and responding.

여기서, 앙상블 기법에 대해서 먼저 알아보자면, 다수의 인공지능 학습 알고리즘을 이용하여 병렬적 학습을 수행하고 학습 결과에 의한 예측 모델(모형)들을 결합하여, 안정성과 예측력이 증가한 하나의 예측 결과를 생성하는 학습 방법이다.Here, to look at the ensemble technique first, parallel learning is performed using a plurality of artificial intelligence learning algorithms, and prediction models (models) based on learning results are combined to generate one prediction result with increased stability and predictive power. way to learn.

즉, 다수의 인공지능 학습 알고리즘을 이용하여 병렬적 학습을 수행하고, 학습 결과에 의한 예측 모델으로부터 각각의 분석값들을 출력받아, 이들을 비교 판단함으로써, 하나의 가장 나은 분석값을 결과로 도출할 수 있는 학습 기법이다.That is, one best analysis value can be derived as a result by performing parallel learning using a plurality of artificial intelligence learning algorithms, receiving each analysis value output from the prediction model based on the learning result, and comparing and judging them. It is a learning technique in

이를 통해서, 단일 인공지능 학습 모델을 사용하는 것에 비해 성능을 분산시킬 수 있어 과적합(overfitting)을 감소시킬 수 있으며, 각각의 학습 예측 모델의 성능이 좋지 않더라도 더 좋은 예측 성능을 얻을 수 있다.Through this, compared to using a single artificial intelligence learning model, overfitting can be reduced by dispersing performance, and better prediction performance can be obtained even if the performance of each learning prediction model is poor.

단일 인공지능 학습 알고리즘을 이용할 경우, 여러 개의 학습 데이터(기관 별로 상이한 운영 데이터)가 들어오면 학습 처리 결과가 공평하지 않아, 이를 각각의 기관에 적용하더라도 그 효과는 미비할 수 밖에 없다.In the case of using a single artificial intelligence learning algorithm, if multiple learning data (different operational data for each institution) come in, the learning process result is not fair, so even if it is applied to each institution, the effect is inevitably insignificant.

이러한 문제점을 해결하기 위하여, 상술한 바와 같이, 본 발명의 일 실시예에 따른 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템 및 그 방법은, 다수의 인공지능 알고리즘을 적용한 앙상블 기법을 이용함으로써, 다수의 인공지능 학습 알고리즘을 이용하여 병렬적 학습을 수행하고, 학습 결과에 의한 다수의 예측 모델들을 결합하여 가장 정확한 분석 값(예측 값)을 산출할 수 있다.In order to solve this problem, as described above, a social issue-based cyber target attack prediction system and method using an attacker group similarity technique according to an embodiment of the present invention employs an ensemble technique to which a plurality of artificial intelligence algorithms are applied. By using, it is possible to perform parallel learning using a plurality of artificial intelligence learning algorithms and to calculate the most accurate analysis value (predictive value) by combining a plurality of prediction models based on the learning result.

이러한 앙상블 기법으로는 대표적으로 다수결/투표 기반(voting), 배깅(bagging)과 페이스팅(pasting) 등의 방법이 있다.Examples of such ensemble techniques include methods such as majority rule/voting, bagging, and pasting.

다수결/투표 기반 앙상블 기법은, 학습 과정에서 다수 개의 인공지능 학습 알고리즘을 이용하여 동일한 학습 데이터에 대한 학습을 수행하고, 각각의 학습 결과 모델들의 분석 값을 가지고 다수결 투표를 진행하여 최종 결과 값을 산출하는 방법이다. 이러한 다수결/투표 기반 앙상블 기법은 가장 성능이 좋은 인공지능 기법보다 정확도가 높은 것으로 알려져 있다.The majority vote/vote-based ensemble technique performs learning on the same training data using a plurality of artificial intelligence learning algorithms in the learning process, and calculates the final result value by proceeding with a majority vote with the analysis value of each learning result model. way to do it These majority/vote-based ensemble techniques are known to have higher accuracy than the best performing artificial intelligence techniques.

또한, 배깅과 페이스팅의 앙상블 기법은, 동일한 인공지능 학습 알고리즘을 다수 개 이용하지만, 학습 데이터를 랜덤하게 입력함으로써, 결론적으로 생성되는 학습 결과 모델 자체가 각기 다르게 학습을 수행하게 된다. 이를 통해서, 각각의 학습 결과 모델들의 분석값을 모아서 새로운 결과 값을 예측하는 방법이다.In addition, the ensemble technique of bagging and pasting uses a plurality of the same artificial intelligence learning algorithms, but randomly inputs learning data, so that the resulting learning result models themselves perform learning differently. Through this, it is a method of predicting a new result value by collecting analysis values of each learning result model.

다수 개의 동일한 인공지능 학습 알고리즘에 랜덤하게 학습 데이터를 입력하는 과정에서, 중복을 허용하여 샘플링하는 방식을 배깅이라고 하고, 중복을 허용하지 않고 샘플링하는 방식을 페이스팅이라고 한다.In the process of randomly inputting training data into a plurality of identical AI learning algorithms, a sampling method that allows overlapping is called bagging, and a sampling method that does not allow overlapping is called pasting.

도 1은 본 발명의 일 실시예에 따른 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템을 나타낸 구성 예시도로서, 도 1을 참조로 하여 본 발명의 일 실시예에 따른 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템을 상세히 설명한다.1 is an exemplary configuration diagram showing a social issue-based cyber target attack prediction system using an attacker group similarity technique according to an embodiment of the present invention. Referring to FIG. 1, the attacker group similarity according to an embodiment of the present invention The social issue-based cyber target attack prediction system using the technique is explained in detail.

본 발명의 일 실시예에 따른 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템은 도 1에 도시된 바와 같이, 데이터 수집부(100), 학습 처리부(200), 데이터 입력부(300), 위험도 분석부(400) 및 예측 분석부(500)를 포함하여 구성되는 것이 바람직하다. 또한, 각 구성들은 컴퓨터를 포함하는 적어도 하나 이상의 연산처리수단에 각각 또는 통합 포함되어 동작을 수행하는 것이 바람직하다.Social issue-based cyber target attack prediction system using attacker group similarity technique according to an embodiment of the present invention, as shown in FIG. , It is preferably configured to include a risk analysis unit 400 and a prediction analysis unit 500. In addition, it is preferable that each component performs an operation by being individually or collectively included in at least one calculation processing means including a computer.

각 구성에 대해서 자세히 알아보자면,For a detailed look at each component,

상기 데이터 수집부(100)는 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템을 적용하고자 하는 다수의 기관의 운영서버로부터 보안 관련 운영 데이터를 각각 입력받는 것이 바람직하다.It is preferable that the data collection unit 100 receives security-related operation data from operation servers of a plurality of institutions to which a social issue-based cyber target attack prediction system using an attacker group similarity technique is applied.

이 때, 상기 다수의 기관의 운영서버란, 반드시 사회이슈 기반 사이버 표적공격이 아니더라도, 사이버 표적공격의 발생을 우려하여 사전에 대응 조치를 취하고자 하는 개인, 단체, 기관, 정부 등이 운영하는 운영서버를 의미한다.At this time, the operation servers of multiple organizations are operated by individuals, organizations, institutions, governments, etc. who want to take countermeasures in advance in fear of the occurrence of cyber-targeted attacks, even if they are not necessarily social issue-based cyber-targeted attacks. means server.

상기 데이터 수집부(100)는 각각의 운영서버로부터 정상 정보를 포함하는 보안 관련 운영 데이터 뿐 아니라, 공격 정보를 포함하는 보안 관련 운영 데이터, 상세하게는, 과거에 사회이슈 기반 사이버 표적공격 또는, 사회이슈 기반이 아니더라도 발생했던 사이버 표적공격 관련 정보(시간 정보를 포함하고 있는 보안 이벤트 로그, 웹 트래픽 정보 등)를 입력받는 것이 바람직하다.The data collection unit 100 not only security-related operation data including normal information from each operation server, but also security-related operation data including attack information, in detail, cyber target attacks based on social issues in the past or social Even if it is not issue-based, it is desirable to receive information related to the cyber target attack that occurred (security event log including time information, web traffic information, etc.).

상기 학습 처리부(200)는 상기 데이터 수집부(100)에 의한 각 운영서버의 보안 관련 운영 데이터를 인공지능 학습을 위한 학습 데이터 셋으로 생성하는 것이 바람직하다. 상세하게는, 상기 학습 데이터 셋은 각각의 운영서버로부터 전달받은 상기 보안 관련 운영 데이터, 즉, 포함되어 있는 정상 운영 정보, 공격 운영 정보, 사회이슈 기반 공격 정보 등을 이용하여 생성하게 된다.Preferably, the learning processing unit 200 generates security-related operation data of each operation server by the data collection unit 100 as a learning data set for artificial intelligence learning. In detail, the learning data set is created using the security-related operating data transmitted from each operating server, that is, included normal operating information, attack operating information, social issue-based attack information, and the like.

또한, 각 운영서버의 보안 관련 운영 데이터를 다양하게 조합하여 상기 학습 데이터 셋을 생성하는 것이 바람직하다. 일 예를 들자면, a, b, c 운영서버로부터 d, e, f 보안 관련 운영 데이터가 입력될 경우, d, e, f 보안 관련 운영 데이터를 각각의 학습 데이터 셋으로 생성하거나, 통합하여 하나의 학습 데이터 셋으로 생성할 수 있으며, d 보안 관련 운영 데이터에는 정상 운영 정보만을 포함하고 있을 경우, d 보안 관련 운영 데이터와 e, f 보안 관련 운영 데이터를 각각 학습 데이터 셋으로 생성할 수도 있다. 이러한 학습 데이터 셋을 생성하는데 적용되는 설정 값은 사회이슈 기반 사이버 표적공격의 예측 정확도를 향상시키기 위해 외부 사용자의 제어에 의해 설정되는 것이 바람직하다.In addition, it is preferable to generate the learning data set by variously combining security-related operation data of each operation server. For example, when d, e, and f security-related operational data are input from a, b, and c operation servers, the d, e, and f security-related operational data are generated as individual learning data sets or integrated into one It can be created as a learning data set, and if the d security-related operational data contains only normal operating information, d security-related operational data and e, f security-related operational data can be created as learning data sets, respectively. It is preferable that the setting value applied to generate this learning data set is set under the control of an external user in order to improve the prediction accuracy of a social issue-based cyber target attack.

더불어, 상기 학습 처리부(200)는 다수의 인공지능 알고리즘을 이용하여 생성한 상기 학습 데이터 셋에 대한 학습 처리를 수행하는 것이 바람직하다.In addition, the learning processing unit 200 preferably performs learning processing on the learning data set generated using a plurality of artificial intelligence algorithms.

상세하게는, 다수의 인공지능 알고리즘을 적용한 앙상블 기법을 이용하여 생성한 상기 학습 데이터 셋을 각각 적용하여 병렬적 학습 처리를 수행하게 된다. 물론, 상술한 바와 같이, 상기 학습 처리부(200)는 설정된 앙상블 기법의 세부 기법에 의해 다수의 인공지능 알고리즘이 모두 동일한 인공지능 학습 알고리즘이 적용될지, 모두 상이한 인공지능 학습 알고리즘이 적용될지 설정되는 것이 바람직하며, 상기 학습 데이터 셋의 생성 역시도 설정된 앙상블 기법의 세부 기법에 의해 제어될 수도 있다.In detail, a parallel learning process is performed by applying each of the learning data sets generated using an ensemble technique to which a plurality of artificial intelligence algorithms are applied. Of course, as described above, the learning processing unit 200 sets whether the same artificial intelligence learning algorithm or all different artificial intelligence learning algorithms are applied to a plurality of artificial intelligence algorithms according to the detailed technique of the set ensemble technique. Preferably, the generation of the learning data set may also be controlled by detailed techniques of the set ensemble techniques.

다만, 상기 학습 처리부(200)는 동일한 기초 데이터(다수의 기관의 운영서버로부터 입력받은 보안 관련 운영 데이터)를 토대로 생성된 학습 데이터 셋에 대해 다수의 인공지능 학습 알고리즘을 이용하여 학습 처리를 수행하는 것이 바람직하다.However, the learning processing unit 200 performs learning processing using a plurality of artificial intelligence learning algorithms for a learning data set generated based on the same basic data (security-related operation data input from operation servers of multiple institutions) it is desirable

상기 데이터 입력부(300)는 상기 다수의 기관의 운영서버 중 적어도 어느 하나로부터 실시간 운영 데이터를 입력받는 것이 바람직하다. 상기 데이터 수집부(100)에 의해 입력받은 데이터가 과거 데이터라면, 상기 데이터 입력부(300)에 의해 입력받은 데이터는 실시간 데이터에 해당한다.The data input unit 300 preferably receives real-time operation data from at least one of the operation servers of the plurality of institutions. If the data input by the data collection unit 100 is past data, the data input by the data input unit 300 corresponds to real-time data.

즉, 상기 학습 처리부(200)에 의해 학습이 완료된 후, 그 시점에서부터 수집되는 운영 데이터를 상기 실시간 운영 데이터로 입력받게 된다.That is, after learning is completed by the learning processing unit 200, operation data collected from that point in time is received as the real-time operation data.

상기 위험도 분석부(400)는 상기 학습 처리부(200)에 의해 생성한 다수의 인공지능 모델을 적용한 앙상블 기법을 이용하여, 상기 데이터 입력부(300)에 의한 상기 실시간 운영 데이터를 입력하여, 상기 실시간 운영 데이터에 대한 사회이슈 기반 사이버 표적공격의 발생 위험도를 산출하는 것이 바람직하다.The risk analysis unit 400 inputs the real-time operation data by the data input unit 300 using an ensemble technique to which a plurality of artificial intelligence models generated by the learning processing unit 200 are applied, and the real-time operation It is desirable to calculate the risk of occurrence of cyber targeted attacks based on social issues for data.

상세하게는, 상기 위험도 분석부(400)는 상기 학습 처리부(200)에 의해 생성한 다수의 인공지능 모델에 상기 데이터 입력부(300)에 의한 상기 실시간 운영 데이터를 각각 입력하여, 다수의 인공지능 모델에서 각각 출력한 다수의 분석 결과를 종합함으로써, 상기 실시간 운영 데이터에 대한 사회이슈 기반 사이버 표적공격의 발생 위험도를 산출하게 된다.In detail, the risk analysis unit 400 inputs the real-time operation data by the data input unit 300 to the plurality of artificial intelligence models generated by the learning processing unit 200, respectively, and multiple artificial intelligence models. The risk of occurrence of a cyber target attack based on social issues for the real-time operation data is calculated by integrating a plurality of analysis results output from each.

이 때, 상기 위험도 분석부(400)는 다수의 인공지능 모델에서 각각 출력한 다수의 분석 결과를 비교 판단하여, 다수결 결과 또는, 평균값 결과를 활용하여 해당하는 운영서버에 대한 사회이슈 기반 사이버 표적공격의 발생 위험도를 산출하게 된다.At this time, the risk analysis unit 400 compares and determines a plurality of analysis results output from a plurality of artificial intelligence models, and utilizes a majority vote result or an average value result for a social issue-based cyber target attack on the corresponding operation server. will calculate the risk of occurrence.

다수의 인공지능 모델을 적용한 앙상블 기법을 이용함에 따른 결과 산출의 경우, 다양하게 설정이 제어될 수 있으나, 사회이슈 기반 사이버 표적공격의 특성상 예방 대응할 때 발생하는 비용이 만에 하나라도 공격이 발생할 경우의 해결 비용보다 상대적으로 적기 때문에, 보다 적극적으로 대응할 수 있도록 발생 위험도를 산출하는 것이 바람직하다.In the case of result calculation by using an ensemble technique to which multiple artificial intelligence models are applied, various settings can be controlled, but due to the nature of social issue-based cyber targeted attacks, if an attack occurs even if the cost incurred in preventing and responding is 10,000 Since it is relatively less than the solution cost of , it is desirable to calculate the degree of occurrence risk so that it can respond more actively.

상기 예측 분석부(500)는 저장되는 공격자 그룹별 공격 특성 DB를 이용하여, 상기 위험도 분석부(400)에서 분석한 사회이슈 기반 사이버 표적공격의 발생시점을 예측하는 것이 바람직하다.It is preferable that the prediction analysis unit 500 predicts the occurrence time of the social issue-based cyber target attack analyzed by the risk analysis unit 400 using the stored attack characteristic DB for each attacker group.

상세하게는, 상기 예측 분석부(500)는 상기 위험도 분석부(400)에서 산출한 발생 위험도가 해당하는 운영서버의 미리 설정된 위험도 임계치를 초과할 경우, 예측 분석 동작을 수행하게 된다.In detail, the prediction analysis unit 500 performs a prediction analysis operation when the risk of occurrence calculated by the risk analysis unit 400 exceeds a preset risk threshold of the corresponding operation server.

다시 말하자면, 상기 예측 분석부(500)는 상기 위험도 분석부(400)에서 산출한 해당하는 운영서버에 대한 사회이슈 기반 사이버 표적공격의 발생 위험도가 미리 설정된 임계치를 기준으로 초과할 경우, 추가적으로 사회이슈 기반 사이버 표적공격의 발생시점을 예측하는 것이 바람직하다. 이 때, 운영서버마다 보안 등급 등이 상이하기 때문에, 위험도 임계치는 상이하게 설정될 수 있다.In other words, the predictive analysis unit 500 additionally issues a social issue when the risk of occurrence of a social issue-based cyber target attack on the corresponding operation server calculated by the risk analysis unit 400 exceeds a preset threshold. It is desirable to predict the point of occurrence of the base cyber target attack. At this time, since the security level is different for each operation server, the risk level threshold may be set differently.

이 때, 상기 예측 분석부(500)로 상기 공격자 그룹별 공격 특성 DB를 저장하기 위하여, 본 발명의 일 실시예에 따른 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템은 도 1에 도시된 바와 같이, 제1 수집부(600), 제2 수집부(700), 유사도 분석부(800) 및 공격자 분석부(900)를 더 ㅍ함하여 구성되는 것이 바람직하다.At this time, in order to store the attack characteristic DB for each attacker group in the prediction analyzer 500, the social issue-based cyber target attack prediction system using the attacker group similarity method according to an embodiment of the present invention is shown in FIG. As shown, it is preferable to further include a first collection unit 600, a second collection unit 700, a similarity analysis unit 800, and an attacker analysis unit 900.

상기 제1 수집부(600)는 과거 발생한 다수의 사이버 표적공격 관련 데이터를 수집하는 것이 바람직하다. 이 때, 상기 제1 수집부(600)를 통해서 수집하는 상기 사이버 표적공격 관련 데이터로는 발생한 사이버 표적공격의 유형을 파악하여 발생할 당시의 공격대상 기관정보, 공격의 세부적인 유형 정보, 공격자 정보, 공격코드 정보, 공격코드 유형 정보 등을 포함하여 구성된다.Preferably, the first collection unit 600 collects data related to multiple cyber target attacks that have occurred in the past. At this time, the data related to the cyber-targeted attack collected through the first collection unit 600 includes the type of cyber-targeted attack that has occurred, information on the attack target agency at the time of occurrence, detailed type information of the attack, attacker information, It is composed of attack code information, attack code type information, etc.

상기 제2 수집부(700)는 상기 제1 수집부(600)를 통해서 수집한 과거 발생한 다수의 사이버 표적공격이 발생한 시점을 기준으로 소정기간 동안의 사회이슈 관련 데이터를 수집하는 것이 바람직하다. 즉, 상기 제1 수집부(600)로부터 과거 발생한 A, B 사이버 표적공격의 관련 데이터들이 수집되었다면, 상기 제2 수집부(700)는 과거 A 사이버 표적공격이 발생한 시점을 기준으로 소정기간 전부터 소정기간 후까지의 사회이슈 데이터와, 과거 B 사이버 표적공격이 발생한 시점을 기준으로 소정기간 전부터 소정기간 후까지의 사회이슈 데이터를 수집하는 것이 바람직하다.It is preferable that the second collection unit 700 collects data related to social issues for a predetermined period based on the time point when a number of past cyber target attacks collected through the first collection unit 600 occurred. That is, if data related to cyber target attacks A and B that occurred in the past have been collected from the first collection unit 600, the second collection unit 700 will be collected from a predetermined period prior to the time when A cyber target attack occurred in the past. It is desirable to collect social issue data up to after the period and social issue data from before a predetermined period to after a predetermined period based on the point in time when the B cyber target attack occurred in the past.

이를 위해, 상기 제2 수집부(700)는 분석 대상 사이트들의 크롤링을 수행하여 상기 사회이슈 데이터를 수집하는 것이 바람직하다.To this end, it is preferable that the second collection unit 700 collects the social issue data by performing crawling of analysis target sites.

즉, 상기 제2 수집부(700)는 분석 대상 사이트들의 크롤링을 수행하여, 수집한 해당 사이트의 다양한 웹 문서 데이터(웹 페이지 데이터 등)들을 분석하여, 해당 기간에 발생한 사회이슈 데이터들을 수집하는 것이 바람직하다.That is, the second collection unit 700 performs crawling of analysis target sites, analyzes various web document data (web page data, etc.) of the collected site, and collects social issue data that occurred during the period. desirable.

이 때, 상기 사회이슈 데이터란, 언론사 등에서 이슈화하여 내보내고 있는 기사들을 종합하여 볼 때 사회이슈를 추측할 수 있다.In this case, social issues can be inferred when considering the above-mentioned social issue data, articles that have been issued and published by media outlets, etc. are aggregated.

그렇지만, 언론사에서 발간(발생, 생성, 업로드 등)하는 기사들을 수집하여 이에 대한 주요 키워드들을 추출할 경우, 가령 언론기사 분석을 통해 '정부'라는 키워드가 많이 나타났다고 분석될 경우, 단순히 해당 키워드만을 가지고는 앞뒤 상황을 유추하기가 어렵기 때문에 이를 사회이슈 키워드라고 단정할 수 없다. 뿐만 아니라, 이를 사회이슈로 단정지었다 할지라도 추후에 이에 대한 사회상황 해석이 거의 불가능하게 된다. 그렇기 때문에, 이러한 점을 감안하여, 언론사 등에서 이슈화하여 내보내고 있는 기사들을 수집하여 수집한 기사들을 분석하여 발견된 핵심 키워드를 중심으로, 연관된 키워드들까지 같이 추출하여 하나의 그룹으로 묶어 이를 사회이슈로 도출하는 것이 바람직하다. 특정기간에 발생한 사회이슈 데이터와 더 나아가 그 사회이슈가 발생한 이유, 경과 등을 일목요연하게 정리하여 확인할 수 있다.However, when collecting articles published (occurrence, creation, upload, etc.) by media outlets and extracting key keywords for them, for example, if it is analyzed that the keyword 'government' appeared a lot through analysis of media articles, simply using only the keyword Since it is difficult to infer the context of "", it is difficult to conclude that it is a keyword of social issues. In addition, even if it is determined as a social issue, it is almost impossible to interpret the social situation in the future. Therefore, in view of this point, by collecting and analyzing the articles that have been issued as issues by the media, etc., the collected articles are extracted from the core keywords found, and even related keywords are extracted together, grouped into one group, and derived as a social issue. It is desirable to do Data on social issues that occurred during a specific period of time, as well as the reasons and progress of those social issues, can be clearly organized and confirmed.

이에 따라, 상기 제2 수집부(700)에서 수집한 상기 사회이슈 데이터는 단순히 하나의 단어가 아니라 특정 기간에 이슈화되고 있는 키워드들, 다시 말하자면 연관성이 있는 키워드들의 집합을 의미할 수도 있다.Accordingly, the social issue data collected by the second collecting unit 700 may mean not just one word, but keywords that are becoming issues in a specific period, that is, a set of related keywords.

상기 유사도 분석부(800)는 상기 제1 수집부(600)와 제2 수집부(700)에 의한 수집 데이터 간의 유사도를 분석하는 것이 바람직하다. 다시 말하자면, 상기 제1 수집부(600)는 과거 발생한 사이버 표적공격 별로 해당하는 공격 관련 데이터가 수집되며, 상기 제2 수집부(700)는 상기 제1 수집부(600)에 의해 수집한 적어도 어느 하나의 사이버 표적공격이 발생한 시점에 대한 사회이슈 데이터를 수집하게 된다.Preferably, the similarity analysis unit 800 analyzes the similarity between data collected by the first collection unit 600 and the second collection unit 700 . In other words, the first collection unit 600 collects attack-related data corresponding to each cyber target attack that has occurred in the past, and the second collection unit 700 collects at least any data collected by the first collection unit 600. Social issue data about the time of one cyber targeted attack will be collected.

상술한 바와 같이, 제1 수집부(600)와 제2 수집부(700)는 단일 사이버 표적공격에 대해서만 수집하는 것이 아니기 때문에, 과거 발생한 사이버 표적공격 별로 매칭하여 공격 관련 데이터 - 사회이슈 데이터를 수집할 수 있다.As described above, since the first collection unit 600 and the second collection unit 700 do not collect only a single cyber target attack, they collect attack-related data - social issue data by matching each cyber target attack that has occurred in the past. can do.

이를 이용하여, 상기 유사도 분석부(800)는 상기 수집 데이터들 간의 유사도를 분석함으로써, 발생한 사이버 표적공격들 간의 유사도를 분석하게 된다.Using this, the similarity analyzer 800 analyzes the similarity between the cyber target attacks that have occurred by analyzing the similarity between the collected data.

상기 공격자 분석부(900)는 상기 유사도 분석부(800)에서 분석한 유사도를 기준으로 과거 사이버 표적공격을 일으킨 공격자를 그룹화하고, 그룹 별로 상기 수집 데이터를 매칭시켜 데이터베이스화하는 것이 바람직하다. 또한, 상기 데이터베이스화한 데이터인 상기 공격자 그룹별 공격 특성 DB를 상기 예측 분석부(500)로 전송하여, 사회이슈 기반 사이버 표적공격의 발생시점을 예측하는데 활용하도록 한다.It is preferable that the attacker analysis unit 900 groups attackers who have caused cyber target attacks in the past based on the similarity analyzed by the similarity analysis unit 800, and matches the collected data for each group to form a database. In addition, the attack characteristic DB for each attacker group, which is the databased data, is transmitted to the prediction analysis unit 500 to be used to predict the occurrence time of a cyber target attack based on social issues.

즉, 상기 공격자 분석부(900)는 상기 유사도 분석부(800)에 의해 분석한 사이버 표적공격들 간의 유사도를 분석하여, 유사도가 일정치 이상(일 예를 들자면, 80%)인 사이버 표적공격을 일으킨 공격자를 그룹화하는 것이 바람직하다. 이 후, 공격자 그룹 별로 공격 특성을 알 수 있는 상기 수집 데이터(공격 관련 데이터 - 사회이슈 데이터)를 매칭시켜, 데이터베이스화하는 것이 바람직하다.That is, the attacker analysis unit 900 analyzes the similarity between the cyber target attacks analyzed by the similarity analysis unit 800, and detects a cyber target attack having a similarity higher than a certain value (for example, 80%). It is desirable to group the aggressor who caused it. After that, it is preferable to match the collected data (attack related data - social issue data), which can know the attack characteristics for each attacker group, and form a database.

이를 통해서, 상기 공격자 분석부(900)에서 생성한 데이터베이스인 상기 공격자 그룹별 공격 특성 DB를 통해서, 단순하게 한번 발생한 사이버 표적공격이 아닌, 유사성을 띄고 있는 다수 개의 사이버 표적공격에 대한 공격 특성 데이터(공격대상 기관정보, 공격의 세부적인 유형 정보, 공격자 정보, 공격코드 정보, 공격코드 유형 정보, 사회이슈 데이터)를 용이하게 확인할 수 있다.Through this, through the attack characteristic DB for each attacker group, which is a database generated by the attacker analysis unit 900, attack characteristic data for a plurality of cyber target attacks having similarities rather than a cyber target attack that has occurred simply once ( Attack target organization information, attack type information, attacker information, attack code information, attack code type information, social issue data) can be easily checked.

이를 통해서, 상기 예측 분석부(500)는 상기 위험도 분석부(400)에 의해 분석한 사회이슈 기반 사이버 표적공격의 발생 위험도가 해당하는 운영서버의 미리 설정된 위험도 임계치를 초과할 경우, 상기 공격자 그룹별 공격 특성 DB를 이용하여 발생 위험도가 높은 사회이슈 기반 사이버 표적공격의 공격 특성 데이터를 확인하여, 발생 시점을 예측할 수 있다.Through this, the predictive analysis unit 500 determines that, when the risk of social issue-based cyber target attack analyzed by the risk analysis unit 400 exceeds the preset risk threshold of the corresponding operation server, each attacker group It is possible to predict the time of occurrence by checking the attack characteristic data of cyber target attacks based on social issues with a high risk of occurrence using the attack characteristic DB.

또한, 본 발명의 일 실시예에 따른 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템은 도 1에 도시된 바와 같이, 후속 처리부(1000)를 더 포함하여 구성되는 것이 바람직하다.In addition, the social issue-based cyber target attack prediction system using the attacker group similarity technique according to an embodiment of the present invention preferably further includes a follow-up processing unit 1000 as shown in FIG. 1 .

상기 후속 처리부(1000)는 상기 위험도 분석부(400)에서 산출한 발생 위험도 또는, 예측 분석부(500)에서 예측한 발생 시점을 이용하여, 해당하는 운영서버에 매칭되는 사전 대응 조치 정보를 생성하는 것이 바람직하다.The post-processing unit 1000 uses the risk of occurrence calculated by the risk analysis unit 400 or the time of occurrence predicted by the prediction analysis unit 500 to generate proactive action information matched to the corresponding operation server it is desirable

상술한 바와 같이, 사이버 표적공격이 발생했을 때 이를 해결하기 위한 비용이 예측되어서 방어하였지만 실제 사이버 표적공격이 발생하지 않을 때 발생하는 비용보다 훨씬 많기 때문에, 예측을 통한 최대한의 방어를 수행하는 것이 바람직하다.As described above, when a cyber-targeted attack occurs, the cost of resolving it is predicted and defended, but it is much higher than the cost incurred when an actual cyber-targeted attack does not occur. Therefore, it is desirable to perform maximum defense through prediction. do.

그렇기 때문에, 상기 후속 처리부(1000)를 통해서, 상기 위험도 분석부(400)에서 산출한 발생 위험도 또는, 예측 분석부(500)에서 예측한 발생 시점을 이용하여, 발생이 예측되는 사회이슈 기반 사이버 표적공격의 성향에 따른 적절한 대처가 이루어지도록 하는 것이 바람직하다.Therefore, through the post-processing unit 1000, using the risk of occurrence calculated by the risk analysis unit 400 or the occurrence time predicted by the prediction analysis unit 500, social issue-based cyber targets whose occurrence is predicted It is desirable to ensure that appropriate countermeasures are made according to the propensity of the attack.

일 예를 들자면, 'X' 공격자 그룹에 의해 'B' 공격 대상 서버에 공격이 발생하기 전에 90%의 확률로 'A' 사회이슈 키워드의 노출도가 향상되었으며, 'B' 공격 대상 서버에 공격이 발생되면 100% 확률로 'C' 공격 대상 서버에 공격이 발생되었던 데이터들이 상기 공격자 그룹별 공격 특성 DB를 통해서 확보되어 있을 경우, 상기 위험도 분석부(400)에 의해 'B' 공격 대상 서버에 대한 사회이슈 기반 사이버 표적공격의 발생 위험도가 'B' 공격 대상 서버에서 미리 설정한 위험도 임계치인 60을 초과할 경우, 'A' 사회이슈 키워드의 노출도가 향상되었는지 확인하면서 'X' 공격자 그룹에 의한 공격 패턴 등을 사전에 숙지하여 이에 대한 공격을 방어하면서, 'C' 공격 대상 서버에도 공격 발생 가능성이 있음을 전달할 수 있다.For example, before an attack by 'X' attacker group on 'B' target server, the exposure of 'A' social issue keyword was improved with a 90% probability, and the attack on 'B' target server If this occurs, if the data in which the attack occurred on the 'C' target server with a 100% probability is secured through the attack characteristic DB for each attacker group, the risk analysis unit 400 determines the 'B' target server to be attacked. If the risk of a cyber-targeted attack based on social issues on the target server of 'B' exceeds the risk threshold of 60, which is set in advance in the attack target server, 'X' attacker group is identified while checking whether the exposure of 'A' social issue keywords has improved. It is possible to convey that there is a possibility of an attack to the 'C' target server while defending against the attack by knowing the attack pattern in advance.

물론, 'B' 공격 대상 서버와 'C' 공격 대상 서버의 발생 가능성이 상이한 만큼 사전 대응 조치 정도의 심화도가 상이할 수 있지만, 만약 'C' 공격 대상 서버가 보다 주요한 기밀 데이터들을 포함하고 있는 기관의 운영서버일 경우, 발생 가능성 정도의 차이와는 무관하게 발생 가능성이 있다는 것 자체만으로 사전 대응 조치 정도가 제어되는 것이 바람직하다. 이러한 사전 대응 조치 정보에 대해서는 해당하는 운영서버마다 상이할 수 있기 때문에, 이에 대해서 한정하는 것은 아니다.Of course, as the probability of occurrence of the 'B' target server and the 'C' target server is different, the degree of preemptive measures may be different, but if the 'C' target server contains more important confidential data, In the case of an organization's operating server, it is desirable that the degree of preemptive measures be controlled only by the fact that there is a possibility of occurrence regardless of the difference in the degree of occurrence. Since such pre-response action information may be different for each corresponding operation server, it is not limited thereto.

도 2는 본 발명의 일 실시예에 따른 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 방법을 나타낸 순서 예시도이다. 도 2를 참조로 하여, 본 발명의 일 실시예에 따른 산공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 방법을 상세히 설명한다.2 is a flowchart illustrating a method for predicting a cyber target attack based on social issues using an attacker group similarity technique according to an embodiment of the present invention. Referring to FIG. 2, a method for predicting a cyber target attack based on a social issue using an acid attacker group similarity technique according to an embodiment of the present invention will be described in detail.

본 발명의 일 실시예에 따른 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 방법은 도 2에 도시된 바와 같이, 운영 데이터 입력 단계(S100), 학습 처리 단계(S200), 실시간 데이터 입력 단계(S300), 위험도 산출 단계(S400), 예측 분석 단계(S500) 및 대응 단계(S600)를 포함하는 것이 바람직하다. 또한, 컴퓨터로 구현되는 산업 제어 시스템에서의 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템에 의해 각 단계가 수행되게 된다.Social issue-based cyber target attack prediction method using attacker group similarity technique according to an embodiment of the present invention, as shown in FIG. It is preferable to include step (S300), risk calculation step (S400), predictive analysis step (S500) and response step (S600). In addition, each step is performed by a social issue-based cyber target attack prediction system using an attacker group similarity technique in a computer-implemented industrial control system.

각 단계에 대해서 자세히 알아보자면,For a detailed look at each step,

상기 운영 데이터 입력 단계(S100)는 상기 데이터 수집부(100)에서, 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템을 적용하고자 하는 다수의 기관의 운영서버로부터 보안 관련 운영 데이터를 각각 입력받게 된다.In the operation data input step (S100), the data collection unit 100 receives security-related operation data from the operation servers of a plurality of organizations to which a social issue-based cyber target attack prediction system using an attacker group similarity technique is applied. will be entered

이 때, 상기 다수의 기관의 운영서버이란, 반드시 사회이슈 기반 사이버 표적공격이 아니더라도, 사이버 표적공격의 발생을 우려하여 사전에 대응 조치를 취하고자 하는 개인, 단체, 기관, 정부 등이 운영하는 운영서버를 의미한다.At this time, the operating servers of multiple institutions are operated by individuals, organizations, institutions, governments, etc. who want to take countermeasures in advance in fear of the occurrence of cyber-targeted attacks, even if they are not necessarily social issue-based cyber-targeted attacks. means server.

상기 운영 데이터 입력 단계(S100)는 각각의 운영서버로부터 정상 정보를 포함하는 보안 관련 운영 데이터 뿐 아니라, 공격 정보를 포함하는 보안 관련 운영 데이터, 상세하게는, 과거에 사회이슈 기반 사이버 표적공격 또는, 사회이슈 기반이 아니더라도 발생했던 사이버 표적공격 관련 정보(시간 정보를 포함하고 있는 보안 이벤트 로그, 웹 트래픽 정보 등)를 입력받게 된다.The operation data input step (S100) includes not only security-related operation data including normal information from each operation server, but also security-related operation data including attack information, in detail, cyber target attacks based on social issues in the past, Even if it is not based on social issues, information related to cyber target attacks that occurred (security event log including time information, web traffic information, etc.) is input.

상기 학습 처리 단계(S200)는 상기 학습 처리부(200)에서, 상기 운영 데이터 입력 단계(S100)에 의해 입력받은 각 운영서버의 보안 관련 운영 데이터를 인공지능 학습을 위한 학습 데이터 셋으로 생성하게 된다.In the learning processing step (S200), the learning processing unit 200 generates the security-related operating data of each operating server input by the operating data input step (S100) as a learning data set for artificial intelligence learning.

상세하게는, 상기 학습 데이터 셋은 각각의 운영서버로부터 전달받은 상기 보안 관련 운영 데이터, 즉, 포함되어 있는 정상 운영 정보, 공격 운영 정보, 사회이슈 기반 공격 정보 등을 이용하여 생성하게 된다.In detail, the learning data set is created using the security-related operating data transmitted from each operating server, that is, included normal operating information, attack operating information, social issue-based attack information, and the like.

더불어, 상기 학습 처리 단계(S200)는 다수의 인공지능 알고리즘을 이용하여 생성한 상기 학습 데이터 셋에 대한 학습 처리를 수행하게 된다.In addition, in the learning processing step (S200), learning processing is performed on the learning data set generated using a plurality of artificial intelligence algorithms.

상세하게는, 다수의 인공지능 알고리즘을 적용한 앙상블 기법을 이용하여 생성한 상기 학습 데이터 셋을 각각 적용하여 병렬적 학습 처리를 수행하게 된다. 물론, 상술한 바와 같이, 설정된 앙상블 기법의 세부 기법에 의해 다수의 인공지능 알고리즘이 모두 동일한 인공지능 학습 알고리즘이 적용될지, 모두 상이한 인공지능 학습 알고리즘이 적용될지 설정되는 것이 바람직하며, 상기 학습 데이터 셋의 생성 역시도 설정된 앙상블 기법의 세부 기법에 의해 제어될 수도 있다.In detail, a parallel learning process is performed by applying each of the learning data sets generated using an ensemble technique to which a plurality of artificial intelligence algorithms are applied. Of course, as described above, it is preferable to set whether the same artificial intelligence learning algorithm or all different artificial intelligence learning algorithms are applied to a plurality of artificial intelligence algorithms by detailed techniques of the set ensemble technique, and the learning data set The generation of may also be controlled by the detailed technique of the set ensemble technique.

다만, 상기 학습 처리 단계(S200)는 동일한 기초 데이터(다수의 기관의 운영서버로부터 입력받은 보안 관련 운영 데이터)를 토대로 생성된 학습 데이터 셋에 대해 다수의 인공지능 학습 알고리즘을 이용하여 학습 처리를 수행하는 것이 바람직하다.However, the learning processing step (S200) performs learning processing using a plurality of artificial intelligence learning algorithms for a learning data set generated based on the same basic data (security-related operation data input from operation servers of multiple institutions) It is desirable to do

상기 실시간 데이터 입력 단계(S300)는 상기 데이터 입력부(300)에서, 상기 다수의 기관의 운영서버 중 적어도 어느 하나로부터 실시간 운영 데이터를 입력받게 된다. 즉, 상기 학습 처리 단계(S200)에 의해 학습이 완료된 후, 그 시점에서부터 수집되는 운영 데이터를 상기 실시간 운영 데이터로 입력받게 된다.In the real-time data input step (S300), the data input unit 300 receives real-time operation data from at least one of the operation servers of the plurality of institutions. That is, after learning is completed by the learning processing step (S200), operation data collected from that point in time is received as the real-time operation data.

상기 위험도 산출 단계(S400)는 상기 위험도 분석부(400)에서, 상기 학습 처리 단계(S200)에 의해 생성한 다수의 인공지능 모델을 적용한 앙상블 기법을 이용하여, 상기 실시간 데이터 입력 단계(S300)에 의한 상기 실시간 운영 데이터를 입력하여, 상기 실시간 운영 데이터에 대한 사회이슈 기반 사이버 표적공격의 발생 위험도를 산출하게 된다.The risk calculation step (S400) is performed in the real-time data input step (S300) by using an ensemble technique to which a plurality of artificial intelligence models generated by the learning processing step (S200) are applied in the risk analysis unit 400. By inputting the real-time operation data, the risk of occurrence of a cyber target attack based on social issues for the real-time operation data is calculated.

상세하게는, 상기 위험도 산출 단계(S400)는 상기 학습 처리 단계(S200)에 의해 생성한 다수의 인공지능 모델에 상기 실시간 데이터 입력 단계(S300)에 의한 상기 실시간 운영 데이터를 각각 입력하여, 다수의 인공지능 모델에서 각각 출력한 다수의 분석 결과를 종합함으로써, 상기 실시간 운영 데이터에 대한 사회이슈 기반 사이버 표적공격의 발생 위험도를 산출하게 된다.In detail, the risk calculation step (S400) inputs the real-time operating data by the real-time data input step (S300) to the plurality of artificial intelligence models generated by the learning processing step (S200), respectively, By integrating a number of analysis results output from the artificial intelligence model, the risk of occurrence of a cyber target attack based on social issues for the real-time operation data is calculated.

이 때, 다수의 인공지능 모델에서 각각 출력한 다수의 분석 결과를 비교 판단하여, 다수결 결과 또는, 평균값 결과를 활용하여 해당하는 운영서버에 대한 사회이슈 기반 사이버 표적공격의 발생 위험도를 산출하게 된다.At this time, a plurality of analysis results output from a plurality of artificial intelligence models are compared and judged, and a majority vote result or an average value result is used to calculate the risk of a social issue-based cyber target attack on the corresponding operation server.

상기 예측 분석 단계(S500)는 상기 예측 분석부(400)에서 저장되는 공격자 그룹별 공격 특성 DB를 이용하여, 상기 위험도 산출 단계(S400)에 의해 분석한 사회이슈 기반 사이버 표적공격의 발생시점을 예측하게 된다.The predictive analysis step (S500) predicts the occurrence time of the social issue-based cyber target attack analyzed by the risk calculation step (S400) using the attack characteristic DB for each attacker group stored in the predictive analyzer (400). will do

이 때, 상기 예측 분석 단계(S500)는 상기 위험도 산출 단계(S400)에 의해 분석한 사회이슈 기반 사이버 표적공격의 발생 위험도가 해당하는 운영서버의 미리 설정된 위험도 임계치를 초과할 경우, 예측 분석 동작을 수행하게 된다.At this time, in the predictive analysis step (S500), when the risk of occurrence of the social issue-based cyber target attack analyzed by the risk calculation step (S400) exceeds the preset risk threshold of the corresponding operation server, the predictive analysis operation is performed. will perform

즉, 상기 예측 분석 단계(S500)는 상기 위험도 산출 단계(S400)에 의해 산출한 해당하는 운영서버에 대한 사회이슈 기반 사이버 표적공격의 발생 위험도가 미리 설정된 임계치를 기준으로 초과할 경우, 추가적으로 사회이슈 기반 사이버 표적공격의 발생시점을 예측하는 것이 바람직하다. 이 때, 운영서버마다 보안 등급 등이 상이하기 때문에, 위험도 임계치는 상이하게 설정될 수 있다.That is, in the predictive analysis step (S500), when the risk of occurrence of a social issue-based cyber-targeted attack on the corresponding operation server calculated by the risk calculation step (S400) exceeds a preset threshold, an additional social issue It is desirable to predict the point of occurrence of the base cyber target attack. At this time, since the security level is different for each operation server, the risk level threshold may be set differently.

여기서, 상기 공격자 그룹별 공격 특성 DB를 저장하기 위하여, 본 발명의 일 실시예에 따른 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 방법은 도 2에 도시된 바와 같이, 과거 데이터 수집 단계(S510), 유사도 분석 단계(S520) 및 공격자 분석 단계(S530)를 더 포함하여 수행하게 된다.Here, in order to store the attack characteristic DB for each attacker group, the social issue-based cyber target attack prediction method using the attacker group similarity method according to an embodiment of the present invention, as shown in FIG. 2, collects past data. (S510), a similarity analysis step (S520), and an attacker analysis step (S530) are further included.

상세하게는, 상기 과거 데이터 수집 단계(S510)는 상기 제1 수집부(600)에서, 과거 발생한 다수의 사이버 표적공격 관련 데이터를 수집하게 된다. 수집하는 상기 사이버 표적공격 관련 데이터로는 발생한 사이버 표적공격의 유형을 파악하여 발생할 당시의 공격대상 기관정보, 공격의 세부적인 유형 정보, 공격자 정보, 공격코드 정보, 공격코드 유형 정보 등을 포함하여 구성된다.In detail, in the past data collection step (S510), the first collection unit 600 collects data related to a plurality of cyber target attacks that have occurred in the past. The data related to the cyber-targeted attack to be collected includes the type of cyber-targeted attack that occurred, information on the target organization at the time of occurrence, detailed type information of the attack, attacker information, attack code information, attack code type information, etc. do.

더불어, 상기 제2 수집부(700)에서, 수집한 과거 발생한 다수의 사이버 표적공격이 발생한 시점을 기준으로 소정기간 동안의 사회이슈 관련 데이터를 수집하게 된다.In addition, the second collection unit 700 collects data related to social issues for a predetermined period based on the time point at which a number of collected past cyber target attacks occurred.

즉, 상기 과거 데이터 수집 단계(S510)는 상기 제1 수집부(600)에서, 과거 발생한 A, B 사이버 표적공격의 관련 데이터들이 수집되었다면, 상기 제2 수집부(700)에서, 과거 A 사이버 표적공격이 발생한 시점을 기준으로 소정기간 전부터 소정기간 후까지의 사회이슈 데이터와, 과거 B 사이버 표적공격이 발생한 시점을 기준으로 소정기간 전부터 소정기간 후까지의 사회이슈 데이터를 수집하게 된다.That is, in the past data collection step (S510), if the first collection unit 600 has collected data related to cyber-target attacks A and B that have occurred in the past, the second collection unit 700 collects data related to cyber target A in the past. Based on the time of occurrence of the attack, social issue data from before a predetermined period to after a predetermined period of time, and social issue data from before a predetermined period to after a predetermined period based on the time of occurrence of the B cyber-targeted attack in the past are collected.

이를 위해, 상기 과거 데이터 수집 단계(S510)는 분석 대상 사이트들의 크롤링을 수행하여 상기 사회이슈 데이터를 수집하게 된다.To this end, in the past data collection step (S510), the social issue data is collected by crawling analysis target sites.

분석 대상 사이트들의 크롤링을 수행하여, 수집한 해당 사이트의 다양한 웹 문서 데이터(웹 페이지 데이터 등)들을 분석하여, 해당 기간에 발생한 사회이슈 데이터들을 수집하는 것으로, 이 때, 상기 사회이슈 데이터란, 언론사 등에서 이슈화하여 내보내고 있는 기사들을 종합하여 볼 때 사회이슈를 추측할 수 있다.Crawling the sites to be analyzed, analyzing various web document data (web page data, etc.) of the collected sites and collecting social issue data that occurred during the period. Social issues can be inferred when looking at the articles that have been issued and published in various publications.

그렇지만, 언론사에서 발간(발생, 생성, 업로드 등)하는 기사들을 수집하여 이에 대한 주요 키워드들을 추출할 경우, 가령 언론기사 분석을 통해 '정부'라는 키워드가 많이 나타났다고 분석될 경우, 단순히 해당 키워드만을 가지고는 앞뒤 상황을 유추하기가 어렵기 때문에 이를 사회이슈 키워드라고 단정할 수 없다. 뿐만 아니라, 이를 사회이슈로 단정지었다 할지라도 추후에 이에 대한 사회상황 해석이 거의 불가능하게 된다. 그렇기 때문에, 이러한 점을 감안하여, 언론사 등에서 이슈화하여 내보내고 있는 기사들을 수집하여 수집한 기사들을 분석하여 발견된 핵심 키워드를 중심으로, 연관된 키워드들까지 같이 추출하여 하나의 그룹으로 묶어 이를 사회이슈로 도출하는 것이 바람직하다. 특정기간에 발생한 사회이슈 데이터와 더 나아가 그 사회이슈가 발생한 이유, 경과 등을 일목요연하게 정리하여 확인할 수 있다.However, when collecting articles published (occurrence, creation, upload, etc.) by media outlets and extracting key keywords for them, for example, if it is analyzed that the keyword 'government' appeared a lot through analysis of media articles, simply using only the keyword Since it is difficult to infer the context of "", it is difficult to conclude that it is a keyword of social issues. In addition, even if it is determined as a social issue, it is almost impossible to interpret the social situation in the future. Therefore, in view of this point, articles collected and published by media outlets, etc. are collected, analyzed, centered on the core keywords found, and even related keywords are extracted together, grouped into one group, and derived as a social issue. It is desirable to do Data on social issues that occurred during a specific period, as well as the reasons and progress of those social issues, can be clearly organized and confirmed.

이에 따라, 상기 과거 데이터 수집 단계(S510)는 수집한 상기 사회이슈 데이터는 단순히 하나의 단어가 안이라 특정 기간에 이슈화되고 있는 키워드들, 다시 말하자면 연관성이 있는 키워드들의 집합을 의미할 수도 있다.Accordingly, in the past data collection step (S510), the collected social issue data may simply mean keywords that are issues in a specific period, that is, a set of related keywords, rather than simply one word.

상기 유사도 분석 단계(S520)는 상기 유사도 분석부(800)에서, 상기 과거 데이터 수집 단계(S510)에 의한 수집 데이터 간의 유사도를 분석하는 것이 바람직하다.In the similarity analysis step (S520), it is preferable that the similarity analyzer 800 analyzes the similarity between data collected by the past data collection step (S510).

다시 말하자면, 상기 과거 데이터 수집 단계(S510)는 과거 발생한 사이버 표적공격 별로 해당하는 공격 관련 데이터가 수집되며, 이에 연계(연관)된 적어도 어느 하나의 사이버 표적공격이 발생한 시점에 대한 사회이슈 데이터를 수집하게 된다.In other words, in the past data collection step (S510), attack-related data corresponding to each cyber-targeted attack that occurred in the past is collected, and social issue data about the time when at least one cyber-targeted attack linked to (related to) occurred is collected. will do

이 때, 상기 과거 데이터 수집 단계(S510)는 단일 사이버 표적공격에 대해서만 수집하는 것이 아니기 때문에, 과거 발생한 사이버 표적공격 별로 매칭하여 공격 관련 데이터 - 사회이슈 데이터를 수집할 수 있다.At this time, since the past data collection step (S510) does not collect only a single cyber target attack, it is possible to collect attack-related data - social issue data by matching each cyber target attack that has occurred in the past.

이를 이용하여, 상기 유사도 분석 단계(S520)는 상기 수집 데이터들 간의 유사도를 분석함으로써, 발생한 사이버 표적공격들 간의 유사도를 분석하게 된다.Using this, the similarity analysis step (S520) analyzes the similarity between the collected data to analyze the similarity between cyber target attacks.

상기 공격자 분석 단계(S530)는 상기 공격자 분석부(900)에서, 상기 유사도 분석 단계(S520)에 의해 분석한 유사도를 기준으로 과거 사이버 표적공격을 일으킨 공격자를 그룹화하고, 그룹 별로 상기 과거 데이터 수집 단계(S510)에 의한 수집 데이터를 매칭시켜 데이터베이스화하게 된다.In the attacker analysis step (S530), the attacker analysis unit 900 groups attackers who have caused cyber target attacks in the past based on the similarity analyzed by the similarity analysis step (S520), and the past data collection step for each group The data collected by (S510) is matched and converted into a database.

다시 말하자면, 상기 공격자 분석 단계(S530)는 상기 유사도 분석 단계(S520)에 의해 분석한 사이버 표적공격들 간의 유사도를 분석하여, 유사도가 일정치 이상(일 예를 들자면, 80%)인 사이버 표적공격을 일으킨 공격자를 그룹화하는 것이 바람직하다. 이 후, 공격자 그룹 별로 공격 특성을 알 수 있는 상기 수집 데이터(공격 관련 데이터 - 사회이슈 데이터)를 매칭시켜, 데이터베이스화하게 된다.In other words, the attacker analysis step (S530) analyzes the similarity between the cyber target attacks analyzed in the similarity analysis step (S520), and the cyber target attack of which the similarity is higher than a certain value (for example, 80%) It is desirable to group the attackers who caused the attack. After that, the collected data (attack related data - social issue data), which can know the attack characteristics for each attacker group, is matched and converted into a database.

이를 통해서, 생성한 데이터베이스인 상기 공격자 그룹별 공격 특성 DB를 통해서, 단순하게 한번 발생한 사이버 표적공격이 아닌, 유사성을 띄고 있는 다수 개의 사이버 표적공격에 대한 공격 특성 데이터(공격대상 기관정보, 공격의 세부적인 유형 정보, 공격자 정보, 공격코드 정보, 공격코드 유형 정보, 사회이슈 데이터)를 용이하게 확인할 수 있다.Through this, through the attack characteristic DB for each attacker group, which is the database created, attack characteristic data (attack target organization information, attack details enemy type information, attacker information, attack code information, attack code type information, social issue data) can be easily checked.

또한, 상기 예측 분석 단계(S500)는 상기 공격자 분석 단계(S530)에 의해 데이터베이스화한 데이터인 상기 공격자 그룹별 공격 특성 DB를 이용하여 사회이슈 기반 사이버 표적공격의 발생시점을 예측하는데 활용하도록 한다.In addition, the predictive analysis step (S500) uses the attack characteristic DB for each attacker group, which is the data databased by the attacker analysis step (S530), to be used to predict the time of occurrence of a cyber target attack based on social issues.

이에 따라, 상기 예측 분석 단계(S500)는 상기 위험도 산출 단계(S400)에 의해 산출한 사회이슈 기반 사이버 표적공격의 발생 위험도가 해당하는 운영서버의 미리 설정된 위험도 임계치를 초과할 경우, 상기 공격자 그룹별 공격 특성 DB를 이용하여 발생 위험도가 높은 사회이슈 기반 사이버 표적공격의 공격 특성 데이터를 확인하여, 발생 시점을 예측할 수 있다.Accordingly, in the predictive analysis step (S500), when the risk of the social issue-based cyber-targeted attack calculated in the risk calculation step (S400) exceeds the preset risk threshold of the corresponding operation server, each attacker group It is possible to predict the time of occurrence by checking the attack characteristic data of cyber target attacks based on social issues with a high risk of occurrence using the attack characteristic DB.

상기 대응 단계(S600)는 상기 후속 처리부(1000)에서, 상기 위험도 산출 단계(S400) 또는, 상기 예측 분석 단계(S500)의 수행 결과를 이용하여, 해당하는 운영서버에 매칭되는 사전 대응 조치 정보를 생성하게 된다.In the response step (S600), in the subsequent processing unit 1000, by using the result of the risk calculation step (S400) or the predictive analysis step (S500), proactive action information matched to the corresponding operation server is generated. will create

상기 대응 단계(S600)는 발생이 예측되는 사회이슈 기반 사이버 표적공격의 성향에 따른 적절한 대처가 이루어지도록 하며, 이러한 사전 대응 조치 정보에 대해서는 해당하는 운영서버마다 상이할 수 있기 때문에, 이에 대해서 한정하는 것은 아니다.The countermeasure step (S600) ensures that appropriate countermeasures are made according to the propensity of the social issue-based cyber-targeted attack that is predicted to occur, and since such preliminary countermeasure information may be different for each corresponding operating server, it is limited to this It is not.

물론, 'B' 공격 대상 서버와 'C' 공격 대상 서버의 발생 가능성이 상이한 만큼 사전 대응 조치 정도의 심화도가 상이할 수 있지만, 만약 'C' 공격 대상 서버가 보다 주요한 기밀 데이터들을 포함하고 있는 기관의 운영서버일 경우, 발생 가능성 정도의 차이와는 무관하게 발생 가능성이 있다는 것 자체만으로 사전 대응 조치 정도가 제어되는 것이 바람직하다.Of course, as the probability of occurrence of the 'B' target server and the 'C' target server is different, the degree of preemptive measures may be different, but if the 'C' target server contains more important confidential data, In the case of an organization's operating server, it is desirable that the degree of preemptive measures be controlled only by the fact that there is a possibility of occurrence regardless of the difference in the degree of occurrence.

즉, 다시 말하자면, 본 발명의 일 실시예에 따른 공격자 그룹 유사도 기법을 활용한 사회이슈 기반 사이버 표적공격 예측 시스템 및 그 방법은, 아직 발생하지 않은 사회이슈 기반 사이버 표적공격의 발생을 비교적 정확하게 예측할 수 있는 장점이 있다.In other words, the social issue-based cyber target attack prediction system and method using the attacker group similarity technique according to an embodiment of the present invention can relatively accurately predict the occurrence of a social issue-based cyber target attack that has not yet occurred. There are advantages to

이상과 같이 본 발명에서는 구체적인 구성 소자 등과 같은 특정 사항들과 한정된 실시예 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것 일 뿐, 본 발명은 상기의 일 실시예에 한정되는 것이 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.As described above, the present invention has been described with specific details such as specific components and limited embodiment drawings, but this is only provided to help a more general understanding of the present invention, and the present invention is not limited to the above embodiment. No, and those skilled in the art to which the present invention pertains can make various modifications and variations from these descriptions.

따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허 청구 범위뿐 아니라 이 특허 청구 범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be limited to the described embodiments, and it will be said that not only the claims to be described later, but also all modifications equivalent or equivalent to these claims belong to the scope of the present invention. .

100 : 데이터 수집부
200 : 학습 처리부
300 : 데이터 입력부
400 : 위험도 분석부
500 : 예측 분석부
600 : 제1 수집부
700 : 제2 수집부
800 : 유사도 분석부
900 : 공격자 분석부
100 : 후속 처리부100: data collection unit
200: learning processing unit
300: data input unit
400: risk analysis unit
500: predictive analysis unit
600: first collection unit
700: second collection unit
800: similarity analysis unit
900: attacker analysis unit
100: subsequent processing unit

Claims

A data collection unit 100 that receives security-related operation data from operation servers of multiple institutions to which a social issue-based cyber target attack prediction system using an attacker group similarity technique is applied;
A learning processing unit (200) that generates each security-related operation data by the data collection unit (100) as a learning data set for artificial intelligence learning and performs learning processing on the learning data set using a plurality of artificial intelligence algorithms. );
a data input unit 300 receiving real-time operating data from at least one of the operating servers of the plurality of organizations;
By using an ensemble technique to which a plurality of artificial intelligence models generated by the learning processing unit 200 are applied, the real-time operation data by the data input unit 300 is input, and social issue-based cyber targets for the real-time operation data a risk analysis unit 400 that calculates a risk of occurrence of an attack;
a predictive analysis unit 500 that predicts when a cyber target attack based on the social issue analyzed by the risk analysis unit 400 will occur using the stored attack characteristic DB for each attacker group;
A first collection unit 600 that collects data related to multiple cyber target attacks that have occurred in the past;
a second collection unit 700 that collects data related to social issues for a predetermined period based on the time point when each cyber target attack collected by the first collection unit 600 occurs;
a similarity analysis unit 800 analyzing a similarity between data collected by the first collection unit 600 and the second collection unit 700; and
Based on the similarity analyzed by the similarity analyzer 800, attackers who have caused cyber target attacks in the past are grouped, and using the collected data, data related to cyber target attacks by attackers corresponding to each grouped attacker group and corresponding data an attacker analysis unit 900 that matches social issue data to be databased and transmits and stores them to the prediction analysis unit 500;
including,
The predictive analysis unit 500
If the risk of social issue-based cyber target attack on the real-time operation data analyzed by the risk analyzer 400 exceeds the preset risk threshold of the corresponding operation server,
Using the data databased by the attacker analysis unit 900 as an attack characteristic DB for each attacker group, predicting the time of occurrence of the social issue-based cyber target attack analyzed by the risk analysis unit 400, Social issue-based cyber target attack prediction system using attacker group similarity technique.

delete

According to claim 1,
The learning processing unit 200
A social issue-based cyber target attack prediction system using an attacker group similarity technique that performs parallel learning processing by applying each of the generated learning data sets using an ensemble technique to which multiple artificial intelligence algorithms are applied.

According to claim 1,
The risk analysis unit 400
Using an ensemble technique applying a plurality of artificial intelligence models, the real-time operating data is input to a plurality of artificial intelligence models, respectively, and each analysis result for the real-time operating data is compared and determined from the plurality of artificial intelligence models, and the corresponding Social issue-based cyber target attack prediction system using attacker group similarity technique, which calculates the risk of social issue based cyber target attack on the operating server.

According to claim 1,
Social issue-based cyber target attack prediction system using the attacker group similarity technique
A follow-up processing unit 1000 that generates proactive action information matched to a corresponding operation server by using the risk of occurrence calculated by the risk analysis unit 400 or the time of occurrence predicted by the prediction analysis unit 500;
Social issue-based cyber target attack prediction system using attacker group similarity technique, further comprising.

In the method of predicting a cyber target attack based on a social issue using an attacker group similarity method in which each step is performed by a social issue based cyber target attack prediction system using an attacker group similarity method implemented by a computer,
In the data collection unit, an operation data input step (S100) of receiving security-related operation data from operation servers of multiple institutions to which a social issue-based cyber target attack prediction system using an attacker group similarity technique is applied;
In the learning processing unit, the security-related operation data input by the operation data input step (S100) is generated as a learning data set for artificial intelligence learning, and a plurality of artificial intelligence algorithms are used to process the learning data set Learning processing step (S200) to perform;
A real-time data input step of receiving real-time operating data from at least one of the operating servers of the plurality of institutions in the data input unit (S300);
In the risk analysis unit, the real-time operation data is input by the real-time data input step (S300) using an ensemble technique to which a plurality of artificial intelligence models generated by the learning processing step (S200) are applied, and the real-time operation A risk calculation step of calculating the risk of occurrence of a cyber target attack based on social issues for data (S400);
A predictive analysis step (S500) of predicting when a cyber target attack based on the social issue analyzed by the risk calculation step (S400) will occur, using the stored attack characteristic DB for each attacker group in the predictive analyzer; and
In the subsequent processing unit, using the result of the risk calculation step (S400) or the predictive analysis step (S500), a response step (S600) of generating and transmitting preliminary response action information matched to the corresponding operation server;
Including,
In order to store the attack characteristic DB for each attacker group, the first collection unit collects data related to multiple cyber target attacks that have occurred in the past, and the second collection unit collects data related to each cyber target attack. past data collection step of collecting social issue-related data for a predetermined period (S510);
a similarity analysis step (S520) of analyzing the similarity between the data collected by the past data collection step (S510) in a similarity analyzer; and
In the attacker analysis unit, attackers who have caused cyber target attacks in the past are grouped based on the similarity analyzed in the similarity analysis step (S520), and using the collected data, each grouped attacker group is grouped into a cyber target by the corresponding attacker. An attacker analysis step (S530) of matching attack-related data with corresponding social issue data and creating a database;
Including more,
The predictive analysis step (S500)
If the risk of a social issue-based cyber-targeted attack on the real-time operation data analyzed by the risk calculation step (S400) exceeds the preset risk threshold of the corresponding operation server,
Using the data databased by the attacker analysis step (S530) as an attack characteristic DB for each attacker group, predicting the time when the social issue-based cyber target attack analyzed by the risk calculation step (S400) will occur, A method for predicting cyber target attacks based on social issues using attacker group similarity technique.

According to claim 7,
The learning processing step (S200)
A method for predicting cyber target attacks based on social issues using an attacker group similarity technique, in which parallel learning processing is performed by applying each of the generated learning data sets using an ensemble technique to which multiple artificial intelligence algorithms are applied.

delete