KR20230077316A

KR20230077316A - Malicious URL detection method using artificial intelligence technology

Info

Publication number: KR20230077316A
Application number: KR1020210164466A
Authority: KR
Inventors: 강필상; 김지훈
Original assignee: 강필상
Priority date: 2021-11-25
Filing date: 2021-11-25
Publication date: 2023-06-01
Also published as: KR102558952B1

Abstract

본 발명은 인공지능 기술을 이용한 악성 URL 탐지 방법이 개시된다. 본 발명의 악성 URL 탐지 방법은 사용자 단말이 수신된 텍스트 데이터에 URL(uniform resource locator)이 포함되어 있는지 탐지하는 단계, 사용자 단말이 텍스트 데이터에 URL이 포함된 경우, 텍스트와 URL을 분리하고, 분리된 URL이 단축 URL이면 URL을 확장하여 원본의 URL로 복원하며, 복원된 URL과 관련된 제1 탐지 데이터를 수집하는 제1 데이터 전처리하는 단계, 사용자 단말이 제1 데이터 전처리된 정보 및 URL 상태 판단 요청 메시지를 탐지 서버로 전송하는 단계, 탐지 서버가 사용자 단말로부터 수신된 정보와 기 저장된 URL 정보를 비교하여 URL의 상태를 판단하는 단계 및 탐지 서버가 판단된 결과를 사용자 단말로 전송하는 단계를 포함한다.The present invention discloses a malicious URL detection method using artificial intelligence technology. The malicious URL detection method of the present invention includes the steps of detecting whether a URL (uniform resource locator) is included in received text data by a user terminal, and if the URL is included in the text data, the user terminal separates the text and the URL, and separates the URL. If the shortened URL is a shortened URL, the URL is expanded and restored to the original URL, and a first data pre-processing step of collecting first detection data related to the restored URL, and the user terminal requests the first data pre-processed information and URL state determination Transmitting the message to the detection server, the detection server comparing the information received from the user terminal with pre-stored URL information to determine the state of the URL, and the detection server transmitting the determined result to the user terminal. .

Description

Malicious URL detection method using artificial intelligence technology}

본 발명은 악성 URL(uniform resource locator) 탐지 기술에 관한 것으로, 더욱 상세하게는 통계적 분석 및 인공지능 모델에 의한 분석을 기초로 악성 URL을 탐지하여 사용자에게 위험 알림 및 접속 차단을 지원하는 인공지능 기술을 이용한 악성 URL 탐지 방법에 관한 것이다.The present invention relates to a malicious URL (uniform resource locator) detection technology, and more particularly, artificial intelligence technology that detects malicious URLs based on statistical analysis and analysis by an artificial intelligence model to support danger notification and access blocking to users It relates to a malicious URL detection method using

최근 웹 사이트를 통해 악성코드를 다운로드하게 하는 공격이 빈번히 발생하고 있다. 이러한 공격은 사용자의 인지 없이 진행되기 때문에 큰 위협이 되고 있다.Recently, attacks that cause malicious codes to be downloaded through websites frequently occur. These attacks pose a great threat because they proceed without the user's knowledge.

공격에 감염된 PC(personal computer)는 사용자의 개인 정보를 유출하거나, 좀비 PC가 되어 또 다른 공격을 수행함으로써, 추가 피해를 유발한다.A PC (personal computer) infected with the attack leaks user's personal information or becomes a zombie PC and performs another attack, causing additional damage.

따라서 이러한 위협으로부터 사용자의 PC를 보호하기 위해 웹 크롤러를 이용한 악성 웹 사이트의 사전 탐지 연구가 활발히 이루어지고 있다. 특히 웹 사이트로부터 악성으로 의심되는 파일을 다운로드하고 검사함으로써 악성 사이트로 판단하는 기법들을 주로 사용하고 있다.Therefore, in order to protect the user's PC from these threats, proactive detection studies of malicious websites using web crawlers are being actively conducted. In particular, techniques for determining a malicious site by downloading and inspecting a file suspected of being malicious from a web site are mainly used.

하지만 이러한 종래의 기법들은 전체 웹 사이트를 대상으로 검사를 수행하며, 악성으로 의심되는 파일을 다운로드하고 검사하여 웹 사이트의 악성 여부를 판단하기 때문에 소요되는 시간과 비용이 적지 않은 문제를 가지고 있다.However, these conventional techniques have a problem in that time and cost are not small because they scan the entire web site and download and inspect files suspected of being malicious to determine whether the web site is malicious.

한국등록특허공보 제10-1545964호 (2015.08.21.)Korea Patent Registration No. 10-1545964 (2015.08.21.)

본 발명이 이루고자 하는 기술적 과제는 문자, 메일, 네트워크 패킷 등에 포함된 URL의 악성 여부를 판단하여 위험 알림 및 접속 차단을 지원하는 인공지능 기술을 이용한 악성 URL 탐지 방법을 제공하는데 목적이 있다.A technical problem to be achieved by the present invention is to provide a malicious URL detection method using artificial intelligence technology that supports danger notification and access blocking by determining whether URLs included in text messages, mails, network packets, etc. are malicious.

본 발명이 이루고자 하는 다른 기술적 과제는 통계적 분석 및 인공지능 모델에 의한 분석을 기초로 악성 URL 탐지를 하고, 탐지된 결과를 자동 업데이트하여 최신 악성 URL도 관리하는 인공지능 기술을 이용한 악성 URL 탐지 방법을 제공하는데 목적이 있다.Another technical problem to be achieved by the present invention is a malicious URL detection method using artificial intelligence technology that detects malicious URLs based on statistical analysis and analysis by artificial intelligence models, and automatically updates the detected results to manage the latest malicious URLs. It aims to provide

상기 목적을 달성하기 위해 본 발명에 따른 인공지능 기술을 이용한 악성 URL 탐지 방법은 사용자 단말이 수신된 텍스트 데이터에 URL(uniform resource locator)이 포함되어 있는지 탐지하는 단계, 상기 사용자 단말이 상기 텍스트 데이터에 URL이 포함된 경우, 텍스트와 URL을 분리하고, 상기 분리된 URL이 단축 URL이면 URL을 확장하여 원본의 URL로 복원하며, 상기 복원된 URL과 관련된 제1 탐지 데이터를 수집하는 제1 데이터 전처리하는 단계, 상기 사용자 단말이 상기 제1 데이터 전처리된 정보 및 URL 상태 판단 요청 메시지를 탐지 서버로 전송하는 단계, 상기 탐지 서버가 상기 사용자 단말로부터 수신된 정보와 기 저장된 URL 정보를 비교하여 URL의 상태를 판단하는 단계 및 상기 탐지 서버가 상기 판단된 결과를 상기 사용자 단말로 전송하는 단계를 포함한다.In order to achieve the above object, a malicious URL detection method using artificial intelligence technology according to the present invention includes the steps of detecting whether a URL (uniform resource locator) is included in received text data by a user terminal, the user terminal If the URL is included, the text and the URL are separated, and if the separated URL is a shortened URL, the URL is expanded and restored to the original URL, and first data preprocessing for collecting first detection data related to the restored URL Step, the user terminal transmitting the first data pre-processed information and a URL state determination request message to a detection server, wherein the detection server compares the information received from the user terminal with pre-stored URL information to determine the state of the URL It includes the step of determining and the step of the detection server transmitting the determined result to the user terminal.

또한 상기 판단된 결과가 정상인 경우, 상기 사용자 단말이 상기 URL로 접속하는 단계 및 상기 판단된 결과가 악성인 경우, 상기 사용자 단말이 상기 URL을 차단하는 단계를 더 포함하는 것을 특징으로 한다.The method may further include accessing the URL by the user terminal when the determined result is normal, and blocking the URL by the user terminal when the determined result is malicious.

또한 상기 판단된 결과가 새로운 URL로 판단되면 상기 사용자 단말이 웹 포털을 이용하여 상기 제1 탐지 데이터와 다른 제2 탐지 데이터를 추가로 수집하는 제2 데이터 전처리하는 단계, 상기 사용자 단말이 상기 텍스트 데이터, 상기 제1 데이터 전처리된 정보, 상기 제2 데이터 전처리된 정보 및 URL 상태 판단 재요청 메시지를 탐지 서버로 전송하는 단계, 상기 탐지 서버가 상기 사용자 단말로부터 수신된 정보에 포함된 텍스트를 전처리하는 제3 데이터 전처리하는 단계, 상기 탐지 서버가 상기 제1 데이터 전처리된 정보, 상기 제2 데이터 전처리된 정보, 상기 제3 데이터 전처리된 정보 및 제1 인공지능 모델에 의해 학습된 학습 정보를 기반으로 통계적 분석 및 제2 인공지능 모델을 통한 분석을 수행하여 URL의 상태를 재판단하는 단계 및 상기 탐지 서버가 상기 재판단된 결과를 상기 사용자 단말로 전송하는 단계를 더 포함하는 것을 특징으로 한다.In addition, when the determined result is determined to be a new URL, the user terminal performs second data pre-processing of additionally collecting second detection data different from the first detection data using a web portal, and the user terminal performs the text data Transmitting the first data preprocessed information, the second data preprocessed information, and a URL state determination re-request message to a detection server, wherein the detection server preprocesses text included in information received from the user terminal 3 Data pre-processing step, the detection server statistical analysis based on the first data pre-processed information, the second data pre-processed information, the third data pre-processed information and the learning information learned by the first artificial intelligence model and performing an analysis through a second artificial intelligence model to re-judge the state of the URL, and transmitting, by the detection server, the re-judged result to the user terminal.

또한 상기 탐지 서버가 텍스트와 관련된 정보를 상기 제1 인공지능 모델에 학습시키는 단계 및 상기 탐지 서버가 URL과 관련된 정보를 상기 제2 인공지능 모델에 학습시키는 단계를 더 포함하는 것을 특징으로 한다.In addition, the detection server may further include the step of learning information related to text in the first artificial intelligence model and the step of the detection server learning information related to URL in the second artificial intelligence model.

또한 상기 재판단된 결과가 정상인 경우, 상기 사용자 단말이 상기 URL로 접속하는 단계 및 상기 재판단된 결과가 악성인 경우, 상기 사용자 단말이 상기 URL을 차단하는 단계를 더 포함하는 것을 특징으로 한다.The method may further include accessing the URL by the user terminal if the judged result is normal, and blocking the URL by the user terminal if the judged result is malicious.

본 발명의 인공지능 기술을 이용한 악성 URL 탐지 방법은 문자, 메일, 네트워크 패킷 등에 포함된 URL의 악성 여부를 상황에 맞게 단계별로 판단하여 위험 알림 및 접속 차단을 지원할 수 있다.The malicious URL detection method using the artificial intelligence technology of the present invention can support danger notification and access blocking by determining whether URLs included in text messages, mails, network packets, etc. are malicious in stages according to circumstances.

또한 통계적 분석 및 인공지능 모델에 의한 분석 중 적어도 하나를 기초로 악성 URL 탐지를 하고, 탐지된 결과를 자동 업데이트하여 최신 악성 URL도 관리함으로써, 보안성을 높이는 동시에 유지보수 관리를 손쉽게 할 수 있다.In addition, by detecting malicious URLs based on at least one of statistical analysis and analysis by an artificial intelligence model, and automatically updating the detected result to manage the latest malicious URLs, security can be improved and maintenance can be easily managed.

도 1은 본 발명의 실시예에 따른 악성 URL 탐지 시스템을 설명하기 위한 구성도이다.
도 2는 본 발명의 실시예에 따른 사용자 단말을 설명하기 위한 블록도이다.
도 3은 본 발명의 실시예에 따른 제1 데이터 전처리를 설명하기 위한 도면이다.
도 4는 본 발명의 실시예에 따른 제2 데이터 전처리를 설명하기 위한 도면이다.
도 5는 본 발명의 실시예에 따른 탐지 서버를 설명하기 위한 도면이다.
도 6은 본 발명의 실시예에 따른 제3 데이터 전처리를 설명하기 위한 도면이다.
도 7은 본 발명의 실시예에 따른 인공지능 모델을 이용한 분석을 설명하기 위한 도면이다.
도 8 및 도 9는 본 발명의 실시예에 따른 악성 URL 탐지에 대한 성능을 설명하기 위한 도면이다.
도 10은 본 발명의 실시예에 따른 악성 URL 탐지 방법을 설명하기 위한 순서도이다.1 is a configuration diagram for explaining a malicious URL detection system according to an embodiment of the present invention.
2 is a block diagram for explaining a user terminal according to an embodiment of the present invention.
3 is a diagram for explaining first data pre-processing according to an embodiment of the present invention.
4 is a diagram for explaining second data pre-processing according to an embodiment of the present invention.
5 is a diagram for explaining a detection server according to an embodiment of the present invention.
6 is a diagram for explaining third data pre-processing according to an embodiment of the present invention.
7 is a diagram for explaining analysis using an artificial intelligence model according to an embodiment of the present invention.
8 and 9 are diagrams for explaining the performance of malicious URL detection according to an embodiment of the present invention.
10 is a flowchart illustrating a malicious URL detection method according to an embodiment of the present invention.

이하 본 발명의 실시예를 첨부된 도면들을 참조하여 상세히 설명한다. 우선 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의한다. 또한 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 당업자에게 자명하거나 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. First, in adding reference numerals to the components of each drawing, it should be noted that the same components have the same numerals as much as possible even if they are displayed on different drawings. In addition, in describing the present invention, if it is determined that a detailed description of a related known configuration or function is obvious to those skilled in the art or may obscure the gist of the present invention, the detailed description will be omitted.

도 1은 본 발명의 실시예에 따른 악성 URL 탐지 시스템을 설명하기 위한 구성도이다.1 is a configuration diagram for explaining a malicious URL detection system according to an embodiment of the present invention.

도 1을 참조하면, 악성 URL 탐지 시스템(500)은 문자, 메일, 네트워크 패킷 등에 포함된 URL의 악성 여부를 판단하여 위험 알림 및 접속 차단을 지원한다. 악성 URL 탐지 시스템(500)은 통계적 분석 및 인공지능 모델에 의한 분석을 기초로 악성 URL 탐지를 하고, 탐지된 결과를 자동 업데이트하여 최신 악성 URL도 관리한다. 악성 URL 탐지 시스템(500)은 사용자 단말(100), 탐지 서버(200), 웹 포털 서버(300) 및 외부 서버(400)를 포함한다.Referring to FIG. 1 , a malicious URL detection system 500 determines whether URLs included in text messages, mails, network packets, etc. are malicious, and supports danger notification and access blocking. The malicious URL detection system 500 detects malicious URLs based on statistical analysis and analysis by an artificial intelligence model, and automatically updates the detected results to manage the latest malicious URLs. The malicious URL detection system 500 includes a user terminal 100 , a detection server 200 , a web portal server 300 and an external server 400 .

사용자 단말(100)은 사용자(클라이언트)가 사용하는 단말로써, 다양한 서비스를 제공한다. 사용자 단말(100)은 서비스 중 문자, 메일, 네트워크 패킷 등에 포함된 URL의 악성 여부에 대한 판단을 탐지 서버(200)에 요청한다. 사용자 단말(100)은 탐지 서버(200)로부터 판단된 결과에 따라 URL을 접속하거나, URL을 차단한다. 사용자 단말(100)은 스마트폰, 데스크톱, 랩톱, 태블릿 PC, 핸드헬드 PC 등을 포함하는 개인용 컴퓨팅 시스템일 수 있다.The user terminal 100 is a terminal used by a user (client) and provides various services. The user terminal 100 requests the detection server 200 to determine whether a URL included in a text message, mail, or network packet among services is malicious. The user terminal 100 accesses the URL or blocks the URL according to the result determined by the detection server 200 . The user terminal 100 may be a personal computing system including a smart phone, desktop, laptop, tablet PC, handheld PC, and the like.

탐지 서버(200)는 URL의 정상 및 악성을 판단하는 서버이다. 탐지 서버(200)는 사용자 단말(100)로부터 URL의 악성 여부에 대한 판단을 요청하는 메시지를 수신하면 해당 URL을 분석하고, 분석된 결과를 사용자 단말(100)에 전송한다. 이때 탐지 서버(200)는 기 저장된 URL 정보를 기반으로 해당 URL의 악성 여부를 판단하거나, 통계 분석 및 인공지능 모델을 통한 분석을 기반으로 해당 URL의 악성 여부를 판단할 수 있다. 또한 탐지 서버(200)는 판단된 결과를 학습하여 추후 수행되는 분석의 정확도를 향상시킬 수 있다. 탐지 서버(200)는 서버 컴퓨터, 클러스터 컴퓨터 등을 포함하는 서버용 컴퓨팅 시스템일 수 있다.The detection server 200 is a server that determines whether a URL is normal or malicious. When the detection server 200 receives a message requesting determination of whether a URL is malicious from the user terminal 100 , the URL is analyzed and the analyzed result is transmitted to the user terminal 100 . At this time, the detection server 200 may determine whether the URL is malicious based on previously stored URL information or determine whether the corresponding URL is malicious based on statistical analysis and analysis through an artificial intelligence model. In addition, the detection server 200 can improve the accuracy of analysis performed later by learning the determined result. The detection server 200 may be a server computing system including a server computer, a cluster computer, and the like.

웹 포털 서버(300)는 웹 포털을 제공하는 서버이다. 웹 포털 서버(300)는 도면에서 하나의 서버로 도시되고 있으나, 복수의 서버로 구현될 수 있다. 이를 통해 웹 포털 서버(300)는 다양한 웹 포털을 제공할 수 있다. 예를 들어 웹 포털 서버(300)는 G사 포털, N사 포털, D사 포털 등을 제공할 수 있다. 웹 포털 서버(300)는 서버 컴퓨터, 클러스터 컴퓨터 등을 포함하는 서버용 컴퓨팅 시스템일 수 있다.The web portal server 300 is a server that provides a web portal. Although the web portal server 300 is shown as one server in the figure, it may be implemented as a plurality of servers. Through this, the web portal server 300 can provide various web portals. For example, the web portal server 300 may provide a company G portal, a company N portal, and a company D portal. The web portal server 300 may be a server computing system including a server computer, a cluster computer, and the like.

외부 서버(400)는 URL의 최종 목적지와 연결된 웹 포털을 지원하는 서버이다. 즉 외부 서버(400)는 탐지 서버(200)에서 URL이 정상이라고 판단하면 사용자 단말(100)이 접속되고, 악성이라고 판단하면 사용자 단말(100)이 접속되지 않는다. 외부 서버(400)는 서버 컴퓨터, 클러스터 컴퓨터 등을 포함하는 서버용 컴퓨팅 시스템일 수 있다.The external server 400 is a server that supports a web portal connected to the final destination of the URL. That is, if the external server 400 determines that the URL is normal in the detection server 200, the user terminal 100 is accessed, and if it determines that the URL is malicious, the user terminal 100 is not accessed. The external server 400 may be a server computing system including a server computer, a cluster computer, and the like.

한편 악성 URL 탐지 시스템(500)은 사용자 단말(100), 탐지 서버(200), 웹 포털 서버(300), 외부 서버(400) 사이에 통신망(550)을 구축하여 서로 간에 통신을 지원한다. 통신망 (550)은 백본망과 가입자망으로 구성될 수 있다. 백본망은 X.25 망, Frame Relay 망, ATM망, MPLS(Multi Protocol Label Switching) 망 및 GMPLS(Generalized Multi Protocol Label Switching) 망 중 하나 또는 복수의 통합된 망으로 구성될 수 있다. 가입자망은 FTTH(Fiber To The Home), ADSL(Asymmetric Digital Subscriber Line), 케이블망, 지그비(zigbee), 블루투스(bluetooth), Wireless LAN(IEEE 802.11b, IEEE 802.11a, IEEE 802.11g, IEEE 802.11n), Wireless Hart(ISO/IEC62591-1), ISA100.11a(ISO/IEC 62734), COAP(Constrained Application Protocol), MQTT(Multi-Client Publish/Subscribe Messaging), WIBro(Wireless Broadband), Wimax, 3G, HSDPA(High Speed Downlink Packet Access), 4G 및 5G일 수 있다. 일부 실시예로, 통신망(550)은 인터넷망일 수 있고, 이동 통신망일 수 있다. 또한 통신망(550)은 기타 널리 공지되었거나 향후 개발될 모든 무선통신 또는 유선통신 방식을 포함할 수 있다.Meanwhile, the malicious URL detection system 500 establishes a communication network 550 between the user terminal 100, the detection server 200, the web portal server 300, and the external server 400 to support communication between them. The communication network 550 may be composed of a backbone network and a subscriber network. The backbone network may be composed of one or a plurality of integrated networks among an X.25 network, a Frame Relay network, an ATM network, a Multi Protocol Label Switching (MPLS) network, and a Generalized Multi Protocol Label Switching (GMPLS) network. Subscriber networks include FTTH (Fiber To The Home), ADSL (Asymmetric Digital Subscriber Line), cable network, zigbee, Bluetooth, and wireless LAN (IEEE 802.11b, IEEE 802.11a, IEEE 802.11g, IEEE 802.11n ), Wireless Hart (ISO/IEC62591-1), ISA100.11a (ISO/IEC 62734), COAP (Constrained Application Protocol), MQTT (Multi-Client Publish/Subscribe Messaging), WIBro (Wireless Broadband), Wimax, 3G, It may be High Speed Downlink Packet Access (HSDPA), 4G and 5G. In some embodiments, the communication network 550 may be an Internet network or a mobile communication network. In addition, the communication network 550 may include all other well-known wireless communication or wired communication methods to be developed in the future.

도 2는 본 발명의 실시예에 따른 사용자 단말을 설명하기 위한 블록도이고, 도 3은 본 발명의 실시예에 따른 제1 데이터 전처리를 설명하기 위한 도면이며, 도 4는 본 발명의 실시예에 따른 제2 데이터 전처리를 설명하기 위한 도면이다.2 is a block diagram for explaining a user terminal according to an embodiment of the present invention, FIG. 3 is a diagram for explaining first data pre-processing according to an embodiment of the present invention, and FIG. 4 is a diagram for explaining an embodiment of the present invention. It is a diagram for explaining the second data pre-processing according to FIG.

도 1 내지 도 4를 참조하면, 사용자 단말(100)은 통신부(10), 제어부(30), 출력부(50) 및 저장부(70)를 포함한다.Referring to FIGS. 1 to 4 , the user terminal 100 includes a communication unit 10 , a control unit 30 , an output unit 50 and a storage unit 70 .

통신부(10)는 탐지 서버(200), 웹 포털 서버(300) 및 외부 서버(400) 중 적어도 하나와 통신을 한다. 통신부(10)는 텍스트 데이터에 포함된 URL의 악성 여부를 판단하기 위한 정보를 송수신한다.The communication unit 10 communicates with at least one of the detection server 200 , the web portal server 300 and the external server 400 . The communication unit 10 transmits and receives information for determining whether the URL included in the text data is malicious.

제어부(30)는 사용자 단말(100)의 전반적인 제어를 수행한다. 제어부(30)는 수신된 텍스트 데이터에 포함된 URL을 추출하고, 추출된 URL에 대한 판단 요청을 탐지 서버(200)로 전송하여 악성 여부를 전달받고, 전달된 결과를 기반으로 URL 접속을 제어한다. 여기서 제어부(30)는 URL이 정상이면 해당 URL로 접속하고, URL이 악성이면 해당 URL을 차단한다. 제어부(30)는 URL 탐지부(31), 데이터 수집부(33) 및 URL 차단부(35)를 포함한다.The control unit 30 performs overall control of the user terminal 100 . The control unit 30 extracts the URL included in the received text data, transmits a request for determining the extracted URL to the detection server 200, receives whether or not it is malicious, and controls URL access based on the transmitted result. . Here, the control unit 30 accesses the URL if the URL is normal, and blocks the URL if the URL is malicious. The control unit 30 includes a URL detection unit 31, a data collection unit 33 and a URL blocking unit 35.

URL 탐지부(31)는 외부로부터 수신된 문자, 메일, 네트워크 패킷 등과 같은 텍스트 데이터를 수신하면 URL이 포함되어 있는지 탐지한다. URL 탐지부(31)는 텍스트 데이터에서 URL이 탐지되면 탐지된 URL을 추출한다. 즉 URL 탐지부(31)는 텍스트 데이터의 텍스트와 URL을 분리한다. URL 탐지부(31)는 분리된 URL이 단축 URL이면 URL을 확장하여 원본의 URL로 복원한다. URL 탐지부(31)는 복원된 URL과 관련된 제1 탐지 데이터를 수집하는 제1 데이터 전처리를 한다(도 3). 여기서 제1 탐지 데이터는 사용자 단말(100)이 URL에 접속시, URL 리다이렉트가 이루어질 경우, URL의 리다이렉트 횟수를 나타내는 리다이렉트 횟수 및 최종 목적지 URL에 대한 정보를 포함한다. 최종 목적지 URL에 대한 정보는 IP 주소, 도메인 이름, http 헤더 파라미터인 content-length, content-type, content-encoding, content-disposition 중 적어도 하나를 포함한다. 여기서 content-length는 URL 응답 패킷의 본문의 길이를 의미하고, content-type은 URL 응답 패킷의 형식을 의미하며, content-encoding은 URL 응답 패킷의 인코딩 방식을 의미하고, content-disposition은 URL 응답 패킷의 처리 형식을 의미한다. URL 탐지부(31)는 제1 데이터 전처리된 정보 및 URL 상태 판단 요청 메시지를 탐지 서버(200)로 전송되도록 제어하여 탐지 서버(200)에서 URL의 상태를 판단할 수 있도록 한다.The URL detecting unit 31 detects whether a URL is included when receiving text data such as text, mail, and network packets received from the outside. The URL detection unit 31 extracts the detected URL when a URL is detected in the text data. That is, the URL detection unit 31 separates text and URL of text data. If the separated URL is a shortened URL, the URL detection unit 31 expands the URL and restores the original URL. The URL detection unit 31 performs a first data pre-processing to collect first detection data related to the restored URL (FIG. 3). Here, the first detection data includes information on the number of redirects indicating the number of URL redirects and final destination URL, when URL redirection is performed when the user terminal 100 accesses the URL. Information on the final destination URL includes at least one of an IP address, a domain name, and http header parameters such as content-length, content-type, content-encoding, and content-disposition. Here, content-length means the length of the body of the URL response packet, content-type means the format of the URL response packet, content-encoding means the encoding method of the URL response packet, and content-disposition means the URL response packet means the processing format of The URL detection unit 31 controls transmission of the first data preprocessed information and the URL state determination request message to the detection server 200 so that the detection server 200 can determine the state of the URL.

데이터 수집부(33)는 탐지 서버(200)로부터 수신된 판단 결과가 새로운 URL로 판단된 경우, URL에 대한 정보 수집을 더 수행하여 탐지 서버(200)에서 URL의 상태를 재판단할 수 있도록 한다. 상세하게는 데이터 수집부(33)는 URL 탐지부(31)로부터 수집된 제1 탐지 데이터에 포함된 도메인 이름을 검색어로 하여 웹 포털 서버(300)에서 제공하는 웹 포털에서 검색을 수행한다. 이를 통해 데이터 수집부(33)는 도메인 이름으로부터 검색된 추가적인 데이터인 제2 탐지 데이터를 수집하는 제2 데이터 전처리를 한다(도 4). 제2 탐지 데이터는 제1 탐지 데이터와 다른 데이터로써, 검색된 결과의 HTML 소스 내에서 a 태그 href 속성 카운트, 검색 결과 건수, 도메인 이름 카운트, 키워드 카운트 중 적어도 하나를 포함한다. 여기서 a 태그 href 속성 카운트는 HTML 소스 상 ‘a’태그 ‘href’속성의 값 중 ‘도메인 이름’과 일치하는 횟수를 의미하고, 검색 결과 건수는 검색 결과 데이터 건수를 의미하며, 도메인 이름 카운트는 검색 결과 텍스트에서 ‘도메인 이름’과 일치하는 문자열 건수를 의미하고, 키워드 카운트는 검색 결과 텍스트에서 기존에 악성 URL과 밀접하다고 판단되어 설정해 놓은 키워드 포함 건수를 의미한다. 데이터 수집부(33)는 텍스트 데이터, 제1 데이터 전처리된 정보, 제2 데이터 전처리된 정보 및 URL 상태 판단 재요청 메시지를 탐지 서버(200)로 전송되도록 제어하여 탐지 서버(200)에서 URL의 상태를 재판단할 수 있도록 한다. When the judgment result received from the detection server 200 is determined to be a new URL, the data collection unit 33 further collects information about the URL so that the detection server 200 can re-determine the state of the URL. . In detail, the data collection unit 33 searches the web portal provided by the web portal server 300 using the domain name included in the first detection data collected from the URL detection unit 31 as a search term. Through this, the data collection unit 33 performs second data pre-processing to collect second detection data, which is additional data retrieved from the domain name (FIG. 4). The second detection data is data different from the first detection data, and includes at least one of a tag href attribute count, the number of search results, a domain name count, and a keyword count in the HTML source of the search result. Here, the a tag href attribute count means the number of times that 'domain name' is matched among the values of the 'a' tag 'href' attribute in the HTML source, the number of search results means the number of search result data, and the domain name count means the number of search result data. It means the number of strings that match the 'domain name' in the result text, and the keyword count means the number of keywords included in the search result text that were determined to be close to existing malicious URLs. The data collection unit 33 controls the text data, the first data preprocessed information, the second data preprocessed information, and the URL status determination re-request message to be transmitted to the detection server 200 so that the URL status in the detection server 200 is transmitted. to be able to judge

URL 차단부(35)는 탐지 서버(200)로부터 판단 또는 재판단된 URL의 상태에 대한 결과를 기반으로 URL의 차단한다. 즉 URL 차단부(35)는 판단 또는 재판단된 URL 상태에 대한 결과가 악성으로 판단되면 해당 URL을 차단하여 해당 URL로 접속되는 현상을 미연에 차단한다.The URL blocking unit 35 blocks the URL based on the result of the state of the URL judged or judged by the detection server 200 . That is, if the URL blocking unit 35 determines that the result of the judgment or re-judgment of the URL is malicious, the URL is blocked to block access to the URL in advance.

출력부(50)는 문자, 메일, 네트워크 패킷 등과 같은 텍스트 데이터를 출력한다. 출력부(50)는 URL 상태 판단에 대한 결과를 출력한다. 출력부(50)는 URL이 정상인 경우, 접속된 URL의 사용자 인터페이스를 출력한다. 출력부(50)는 터치스크린 기능을 포함하는 디스플레이일 수 있으며, 디스플레이는 액정 디스플레이(liquid crystal display, LCD), 박막 트랜지스터 액정 디스플레이(thin film transistor-liquid crystal display, TFT LCD), 유기 발광 다이오드(organic light-emitting diode, OLED), 플렉시블 디스플레이(flexible display), 3차원 디스플레이(3D display) 중에서 적어도 하나를 포함할 수 있다.The output unit 50 outputs text data such as text, mail, and network packets. The output unit 50 outputs the result of determining the URL state. If the URL is normal, the output unit 50 outputs the user interface of the connected URL. The output unit 50 may be a display having a touch screen function, and the display may include a liquid crystal display (LCD), a thin film transistor-liquid crystal display (TFT LCD), an organic light emitting diode (LCD), and an organic light emitting diode (LCD). At least one of an organic light-emitting diode (OLED), a flexible display, and a 3D display may be included.

저장부(70)는 사용자 단말(100)을 구동하기 프로그램 또는 알고리즘이 저장된다. 저장부(70)는 문자, 메일, 네트워크 패킷 등과 같은 텍스트 데이터가 저장된다. 저장부(70)는 URL 상태 판단에 대한 결과가 저장된다. 저장부(70)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(Random Access Memory, RAM), SRAM(Static Random Access Memory), 롬(Read-Only Memory, ROM), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기메모리, 자기 디스크 및 광디스크 중 적어도 하나의 저장매체를 포함할 수 있다. The storage unit 70 stores a program or algorithm for driving the user terminal 100 . The storage unit 70 stores text data such as text, mail, and network packets. The storage unit 70 stores the result of determining the URL state. The storage unit 70 is a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (eg SD or XD memory, etc.), RAM (Random Access Memory, RAM), SRAM (Static Random Access Memory), ROM (Read-Only Memory, ROM), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), magnetic memory, It may include at least one storage medium of a magnetic disk and an optical disk.

한편 사용자 단말(100)은 도면에 도시되지 않았지만 별도의 입력부를 더 포함할 수 있다. 입력부는 마이크, 키패드, 버튼 등을 포함하여 음성, 텍스트 등을 입력할 수 있다.Meanwhile, the user terminal 100 may further include a separate input unit although not shown in the drawings. The input unit may include a microphone, a keypad, and buttons to input voice and text.

도 5는 본 발명의 실시예에 따른 탐지 서버를 설명하기 위한 도면이고, 도 6은 본 발명의 실시예에 따른 제3 데이터 전처리를 설명하기 위한 도면이며, 도 7은 본 발명의 실시예에 따른 인공지능 모델을 이용한 분석을 설명하기 위한 도면이다.5 is a diagram for explaining a detection server according to an embodiment of the present invention, FIG. 6 is a diagram for explaining a third data pre-processing according to an embodiment of the present invention, and FIG. 7 is a diagram according to an embodiment of the present invention. It is a diagram for explaining analysis using an artificial intelligence model.

도 1, 도 5 내지 도 7을 참조하면, 탐지 서버(200)는 서버 통신부(210), 서버 제어부(230) 및 서버 저장부(250)를 포함한다.1, 5 to 7, the detection server 200 includes a server communication unit 210, a server control unit 230 and a server storage unit 250.

서버 통신부(210)는 사용자 단말(100)과 통신을 한다. 서버 통신부(210)는 텍스트 데이터에 포함된 URL의 악성 여부를 판단하기 위한 정보를 송수신한다.The server communication unit 210 communicates with the user terminal 100 . The server communication unit 210 transmits and receives information for determining whether the URL included in the text data is malicious.

서버 제어부(230)는 탐지 서버(200)의 전반적인 제어를 수행한다. 서버 제어부(230)는 URL과 관련된 데이터 관리 및 데이터 분석을 수행하고, 분석 결과를 자동 학습하여 항상 최신 데이터가 반영되도록 유지한다. 서버 제어부(230)는 데이터 관리부(231) 및 데이터 분석부(233)를 포함한다.The server controller 230 performs overall control of the detection server 200 . The server control unit 230 performs URL-related data management and data analysis, and automatically learns the analysis result to always keep the latest data reflected. The server controller 230 includes a data management unit 231 and a data analysis unit 233 .

데이터 관리부(231)는 URL과 관련된 데이터를 관리한다. 데이터 관리부(231)는 사용자 단말(100)로부터 제1 데이터 전처리된 정보 및 URL 상태 판단 요청 메시지를 수신하면 제1 데이터 전처리된 정보에 포함된 도메인 이름 또는 IP 주소 및 기 저장된 URL 정보를 비교하여 URL의 상태를 판단한다. 즉 데이터 관리부(231)는 기 저장된 URL 정보에서 도메인 이름 또는 IP 주소를 검색하고, 검색된 결과에서 동일한 도메인 이름 또는 IP 주소가 검출되면 해당 도메인 이름 또는 IP 주소가 정상 URL인지 악성 URL인지를 판단한다. 이때 데이터 관리부(231)는 기 저장된 URL 정보에 도메인 이름 또는 IP 주소가 검색되지 않으면 새로운 URL로 판단한다. 여기서 URL 정보는 정상 URL인지 악성 URL인지를 구분한 정보를 의미한다. 데이터 관리부(231)는 판단된 결과를 사용자 단말(100)로 전송되도록 제어하여 사용자 단말(100)에서 URL의 상태에 따른 제어를 할 수 있도록 한다.The data management unit 231 manages URL-related data. When the data management unit 231 receives the first data pre-processed information and the URL status determination request message from the user terminal 100, the data management unit 231 compares the domain name or IP address included in the first data pre-processed information with pre-stored URL information to obtain a URL judge the state of That is, the data management unit 231 searches for a domain name or IP address in pre-stored URL information, and when the same domain name or IP address is detected in the search result, it determines whether the domain name or IP address is a normal URL or a malicious URL. At this time, the data management unit 231 determines a new URL if a domain name or an IP address is not searched for in the previously stored URL information. Here, the URL information refers to information that distinguishes whether a URL is a normal URL or a malicious URL. The data management unit 231 controls the determined result to be transmitted to the user terminal 100 so that the user terminal 100 can perform control according to the state of the URL.

데이터 분석부(233)는 사용자 단말(100)로부터 제1 데이터 전처리된 정보, 제2 데이터 전처리된 정보 및 URL 상태 판단 재요청 메시지를 수신하면 수신된 정보에 포함된 텍스트를 전처리하는 제3 데이터 전처리를 한다. 여기서 텍스트는 문자/메일의 텍스트와 검색 결과 텍스트를 의미한다. 데이터 분석부(233)는 문자/메일의 텍스트 및 검색 결과 텍스트를 각각 토큰화한 후, 노이즈 데이터(특수 문자)를 제거한다. 또한 데이터 분석부(233)는 어간 추출 및 불필요한 용어(별다른 의미를 갖지 않는 단어, 설정값 등)를 제거한다. 데이터 분석부(233)는 입력된 텍스트 크기에 맞게 패딩을 수행하고, 기 학습된 제1 인공지능 모델인 LSTM(Long Shor-Term Memory) 모델을 이용하여 악성 문자/메일에 대한 정확도를 산출한다. 즉 데이터 분석부(233)는 문자/메일 텍스트 정확도와 HTML 텍스트 정확도를 산출할 수 있다(도 6).When the data analysis unit 233 receives the first data pre-processed information, the second data pre-processed information, and the URL status determination re-request message from the user terminal 100, the data analysis unit 233 pre-processes the text included in the received information as a third data pre-processor. do Here, the text means text/mail text and search result text. The data analysis unit 233 tokenizes text/mail text and search result text, respectively, and then removes noise data (special characters). In addition, the data analysis unit 233 extracts stems and removes unnecessary terms (words with no particular meaning, setting values, etc.). The data analysis unit 233 performs padding according to the size of the input text, and calculates the accuracy of the malicious text/mail using a long short-term memory (LSTM) model, which is a pre-learned first artificial intelligence model. That is, the data analysis unit 233 may calculate text/mail text accuracy and HTML text accuracy (FIG. 6).

데이터 분석부(233)는 제1 데이터 전처리된 정보, 제2 데이터 전처리된 정보, 제3 데이터 전처리된 정보 및 제1 인공지능 모델에 의해 학습된 학습 정보를 기반으로 통계적 분석 및 인공지능 모델을 통한 분석 중 적어도 하나를 수행하여 URL의 상태를 재판단한다. 데이터 분석부(233)는 수집된 정보의 평균, 최소값, 최대값, 분산, 필수값, 이상치 등을 포함하는 통계적 수치를 이용하여 정상 URL 및 악성 URL에 대한 기준값을 설정한다. 데이터 분석부(233)는 설정된 기준값을 기반으로 현재 검사 대상 URL의 악성 여부를 판단한다. 데이터 분석부(233)는 수집된 정보를 기 학습된 제2 인공지능 모델인 최대 엔트로피 모델(Maximum Entropy Model)에 적용하여 현재 검사 대상 URL의 악성 여부를 판단한다. 데이터 분석부(233)는 판단된 결과를 사용자 단말(100)로 전송되도록 제어하여 사용자 단말(100)에서 URL의 상태에 따른 제어를 할 수 있도록 한다.The data analysis unit 233 performs statistical analysis based on the first data preprocessed information, the second data preprocessed information, the third data preprocessed information, and the learning information learned by the first artificial intelligence model through statistical analysis and artificial intelligence model. At least one of the analyzes is performed to judge the state of the URL. The data analysis unit 233 sets standard values for normal URLs and malicious URLs using statistical values including average, minimum, maximum, variance, essential values, and outliers of the collected information. The data analysis unit 233 determines whether the currently scanned URL is malicious based on the set reference value. The data analysis unit 233 applies the collected information to a maximum entropy model, which is a pre-learned second artificial intelligence model, to determine whether the URL to be currently inspected is malicious. The data analyzer 233 controls the determined result to be transmitted to the user terminal 100 so that the user terminal 100 can perform control according to the state of the URL.

여기서 데이터 분석부(233)는 URL의 악성 여부를 판단하기 이전에 텍스트와 관련된 정보를 제1 인공지능 모델에 학습시키고, URL과 관련된 정보를 제2 인공지능 모델에 학습시킬 수 있다. 또한 데이터 분석부(233)는 URL의 악성 여부를 판단한 결과를 제1 인공지능 모델 및 제2 인공지능 모델에 학습시킬 수 있다.Here, the data analyzer 233 may teach text-related information to the first artificial intelligence model and URL-related information to the second artificial intelligence model before determining whether the URL is malicious. In addition, the data analysis unit 233 may train the first artificial intelligence model and the second artificial intelligence model with a result of determining whether the URL is malicious.

서버 저장부(250)는 탐지 서버(200)를 구동하기 프로그램 또는 알고리즘이 저장된다. 서버 저장부(250)는 URL이 정상인지 악성인지를 구분한 URL 정보가 저장된다. 서버 저장부(250)는 제1 데이터 전처리된 정보 내지 제3 데이터 전처리된 정보가 저장된다. 서버 저장부(250)는 URL 상태 판단에 대한 결과가 저장된다. 서버 저장부(70)는 플래시 메모리 타입, 하드디스크 타입, 미디어 카드 마이크로 타입, 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램, SRAM, 롬, EEPROM, PROM, 자기메모리, 자기 디스크 및 광디스크 중 적어도 하나의 저장매체를 포함할 수 있다. The server storage unit 250 stores programs or algorithms for driving the detection server 200 . The server storage unit 250 stores URL information for distinguishing whether the URL is normal or malicious. The server storage unit 250 stores first through third data preprocessed information. The server storage unit 250 stores the result of determining the URL state. The server storage unit 70 is a flash memory type, a hard disk type, a media card micro type, a card type memory (eg SD or XD memory, etc.), RAM, SRAM, ROM, EEPROM, PROM, magnetic memory, magnetic disk and an optical disk.

도 8 및 도 9는 본 발명의 실시예에 따른 악성 URL 탐지에 대한 성능을 설명하기 위한 도면이다.8 and 9 are diagrams for explaining the performance of malicious URL detection according to an embodiment of the present invention.

도 8은 LSTM 모델을 이용한 악성 문자/메일의 텍스트에 대한 정확도를 나타내는 그래프로써, X축은 텍스트 번호이고, Y축은 정확도를 의미한다. 이때 악성 URL 탐지 시스템(500)은 정상 문자, 메일의 경우 3, 7번과 같은 낮은 정확도가 나오는데 반해, 악성 문자, 메일의 경우 1, 4, 9번과 같은 높은 정확도가 나오는 것을 확인할 수 있다.8 is a graph showing the accuracy of malicious text/mail text using the LSTM model, where the X axis is the text number and the Y axis is the accuracy. At this time, the malicious URL detection system 500 can confirm that normal characters and e-mails have low accuracy such as 3 and 7, while malicious characters and e-mails have high accuracy such as 1, 4 and 9.

또한 도 9는 텍스트 내의 악성 키워드에 대한 카운트를 나타내는 그래프로써, X축은 검색 결과 텍스트 번호이고, Y축은 텍스트 내의 악성 키워드를 포함하는 건수를 의미한다. 이때 악성 URL 탐지 시스템(500)은 웹 포털 검색 결과 내에 악성 키워드가 포함된 건수를 산출한 결과, 악성 URL의 경우 4, 8번과 같은 상대적으로 높은 수치를 산출하는 것을 확인할 수 있다.9 is a graph showing the count of malicious keywords in the text. The X-axis is the text number of the search result, and the Y-axis represents the number of cases including malicious keywords in the text. At this time, it can be confirmed that the malicious URL detection system 500 calculates the number of malicious keywords included in the web portal search results, and calculates relatively high numbers such as 4 and 8 for malicious URLs.

이와 같이 악성 URL 탐지 시스템(500)은 악성 URL의 판단 근거가 될 수 있는 다양한 정보를 수집하여 최대 엔트로피 모델을 이용하여 분석을 수행함으로써, 악성 URL에 대한 탐지율 및 신뢰성을 향상시킬 수 있다. In this way, the malicious URL detection system 500 can improve the detection rate and reliability of malicious URLs by collecting various information that can be a basis for determining malicious URLs and performing analysis using a maximum entropy model.

도 10은 본 발명의 실시예에 따른 악성 URL 탐지 방법을 설명하기 위한 순서도이다.10 is a flowchart illustrating a malicious URL detection method according to an embodiment of the present invention.

도 10을 참조하면, 악성 URL 탐지 방법은 문자, 메일, 네트워크 패킷 등에 포함된 URL의 악성 여부를 상황에 맞게 단계별로 판단하여 위험 알림 및 접속 차단을 지원할 수 있다. 악성 URL 탐지 방법은 통계적 분석 및 인공지능 모델에 의한 분석 중 적어도 하나를 기초로 악성 URL 탐지를 하고, 탐지된 결과를 자동 업데이트하여 최신 악성 URL도 관리함으로써, 보안성을 높이는 동시에 유지보수 관리를 손쉽게 할 수 있다.Referring to FIG. 10 , the method for detecting malicious URLs determines whether URLs included in text messages, e-mails, network packets, etc. are malicious in stages according to circumstances, and can support danger notification and access blocking. The malicious URL detection method detects malicious URLs based on at least one of statistical analysis and artificial intelligence model analysis, and automatically updates the detected results to manage the latest malicious URLs, thereby increasing security and simplifying maintenance management. can do.

S101 단계에서, 사용자 단말(100)은 데이터를 모니터링한다. 사용자 단말(100)은 문자, 메일, 네트워크 패킷 등과 같은 텍스트 데이터를 수신하면 수신된 텍스트 데이터에 URL있는지 탐지한다.In step S101, the user terminal 100 monitors data. When the user terminal 100 receives text data such as text, mail, and network packets, it detects whether there is a URL in the received text data.

S103 단계에서, 사용자 단말(100)는 텍스트 데이터에 URL이 탐지되면 제1 데이터 전처리를 수행한다. 사용자 단말(100)은 텍스트 데이터의 텍스트와 URL을 분리하고, 분리된 URL이 단축 URL이면 URL을 확장하여 원본의 URL로 복원한다. 사용자 단말(100)은 복원된 URL과 관련된 제1 탐지 데이터를 수집하는 제1 데이터 전처리를 한다In step S103, the user terminal 100 performs a first data pre-processing when a URL is detected in the text data. The user terminal 100 separates the text and URL of the text data, and if the separated URL is a shortened URL, the URL is expanded and restored to the original URL. The user terminal 100 performs a first data pre-processing of collecting first detection data related to the restored URL.

S105 단계에서, 사용자 단말(100)은 탐지 서버(200)로 URL 상태 판단을 요청한다. 사용자 단말(100)은 제1 데이터 전처리된 정보 및 URL 상태 판단 요청 메시지를 탐지 서버(200)로 전송하여 탐지 서버(200)에서 URL의 상태를 판단할 수 있도록 한다.In step S105, the user terminal 100 requests URL status determination to the detection server 200. The user terminal 100 transmits the first data preprocessed information and the URL status determination request message to the detection server 200 so that the detection server 200 can determine the status of the URL.

S107 단계에서, 탐지 서버(200)는 수신된 정보를 이용하여 URL 상태를 판단한다. 탐지 서버(200)는 수신된 제1 데이터 전처리된 정보에 포함된 도메인 이름과 기 저장된 URL 정보를 비교하여 URL의 상태를 판단한다. 즉 탐지 서버(200)는 기 저장된 URL 정보에 도메인 이름 또는 IP 주소를 검색하고, 검색된 결과에서 동일한 도메인 이름 또는 IP 주소가 검출되면 해당 도메인 이름 또는 IP 주소가 정상 URL인지 악성 URL인지를 판단한다.In step S107, the detection server 200 determines the URL status using the received information. The detection server 200 compares the domain name included in the received first data pre-processed information with pre-stored URL information to determine the state of the URL. That is, the detection server 200 searches for a domain name or IP address in pre-stored URL information, and when the same domain name or IP address is detected in the search result, it determines whether the corresponding domain name or IP address is a normal URL or a malicious URL.

S109 단계에서, 탐지 서버(200)는 판단된 결과를 사용자 단말에 전송한다. 여기서 판단된 결과가 URL이 정상이면 S111 단계를 수행하고, URL이 악성이면 S113 단계를 수행하며, URL이 새로운 URL이면 S115 단계를 수행한다.In step S109, the detection server 200 transmits the determined result to the user terminal. As a result of the determination here, if the URL is normal, step S111 is performed, if the URL is malicious, step S113 is performed, and if the URL is a new URL, step S115 is performed.

S111 단계에서, 사용자 단말(100)은 URL을 최종 목적지인 외부 서버(400)의 웹 포탈로 접속한다.In step S111, the user terminal 100 connects the URL to the web portal of the external server 400 as the final destination.

S113 단계에서, 사용자 단말(100)은 URL을 차단한다. 사용자 단말(100)은 해당 URL의 접속을 미연에 방지하기 위해 접속 차단을 한다.In step S113, the user terminal 100 blocks the URL. The user terminal 100 blocks access to prevent access to the corresponding URL in advance.

S115 단계에서, 사용자 단말(100)은 웹 포털 서버(300)의 웹 포탈을 통해 URL에 대한 검색을 요청한다. 사용자 단말(100)은 수집된 제1 탐지 데이터에 포함된 도메인 이름을 검색어로 하여 웹 포털 서버(300)에서 검색을 하도록 요청한다.In step S115, the user terminal 100 requests a URL search through the web portal of the web portal server 300. The user terminal 100 requests the web portal server 300 to perform a search using the domain name included in the collected first detection data as a search term.

S117 단계에서, 웹 포털 서버(300)는 수신된 도메인 이름을 기반으로 웹 포털에서 검색을 한다. 웹 포터 서버(300)는 검색된 결과를 후처리한다. 여기서 후처리는 태그(tag) 제거, 분석 피처 도출 등일 수 있다. In step S117, the web portal server 300 searches the web portal based on the received domain name. The web porter server 300 post-processes the searched result. Here, the post-processing may be tag removal, analysis feature derivation, and the like.

S119 단계에서, 웹 포털 서버(300)은 검색된 결과를 사용자 단말(100)로 전송한다.In step S119, the web portal server 300 transmits the search result to the user terminal 100.

S121 단계에서, 사용자 단말(100)은 검색된 결과를 이용하여 제2 데이터 전처리를 수행한다. 사용자 단말(100)은 도메인 이름으로부터 검색된 추가적인 데이터인 제2 탐지 데이터를 수집하는 제2 데이터 전처리를 한다.In step S121, the user terminal 100 performs second data pre-processing using the searched result. The user terminal 100 performs second data pre-processing to collect second detection data, which is additional data retrieved from the domain name.

S123 단계에서, 사용자 단말(100)은 텍스트 데이터, 제1 데이터 전처리된 정보, 제2 데이터 전처리된 정보 및 URL 상태 판단 재요청 메시지를 탐지 서버(200)로 전송한다.In step S123, the user terminal 100 transmits the text data, the first data preprocessed information, the second data preprocessed information, and the URL state determination re-request message to the detection server 200.

S125 단계에서, 탐지 서버(200)는 수신된 정보를 이용하여 제3 데이터 전처리를 한다. 탐지 서버(200)는 문자/메일의 텍스트 및 검색 결과 텍스트를 각각 토큰화한 후, 노이즈 데이터(특수 문자)를 제거한다. 탐지 서버(200)는 어간 추출 및 불필요한 용어(별다른 의미를 갖지 않는 단어, 설정값 등)를 제거한다. 탐지 서버(200)는 입력된 텍스트 크기에 맞게 패딩을 수행하고, 기 학습된 제1 인공지능 모델인 LSTM 모델을 이용하여 악성 문자/메일에 대한 정확도를 산출한다In step S125, the detection server 200 performs third data pre-processing using the received information. The detection server 200 tokenizes text/mail text and search result text, respectively, and then removes noise data (special characters). The detection server 200 extracts stems and removes unnecessary terms (words that have no meaning, settings, etc.). The detection server 200 performs padding according to the size of the input text, and calculates the accuracy of the malicious text/mail using the LSTM model, which is a pre-learned first artificial intelligence model.

S127 단계에서, 탐지 서버(200)는 URL 상태를 재분석한다. 탐지 서버(200)는 제1 데이터 전처리된 정보, 제2 데이터 전처리된 정보, 제3 데이터 전처리된 정보 및 제1 인공지능 모델에 의해 학습된 학습 정보를 기반으로 통계적 분석 및 인공지능 모델을 통한 분석 중 적어도 하나를 수행하여 URL의 상태를 재판단한다. 탐지 서버(200)는 수집된 정보의 평균, 최소값, 최대값, 분산, 필수값, 이상치 등을 포함하는 통계적 수치를 이용하여 정상 URL 및 악성 URL에 대한 기준값을 설정하고, 설정된 기준값을 기반으로 현재 검사 대상 URL의 악성 여부를 판단한다. 탐지 서버(200)는 수집된 정보를 기 학습된 제2 인공지능 모델인 최대 엔트로피 모델에 적용하여 현재 검사 대상 URL의 악성 여부를 판단한다. In step S127, the detection server 200 reanalyzes the URL state. The detection server 200 analyzes through statistical analysis and artificial intelligence model based on the first data preprocessed information, the second data preprocessed information, the third data preprocessed information, and the learning information learned by the first artificial intelligence model. At least one of the following is performed to judge the state of the URL. The detection server 200 sets standard values for normal URLs and malicious URLs using statistical values including the average, minimum value, maximum value, variance, essential value, and outlier of the collected information, and based on the set standard value, the current Determines whether the URL to be inspected is malicious. The detection server 200 applies the collected information to the pre-learned second artificial intelligence model, the maximum entropy model, to determine whether the currently scanned URL is malicious.

S129 단계에서, 탐지 서버(200)는 판단된 결과를 업데이트한다. 탐지 서버(200)는 기 저장된 URL 정보에 판단된 결과를 추가적으로 업데이트하여 최신 URL 정보를 유지한다. 또한 탐지 서버(200)는 판단된 URL의 악성 여부를 제1 인공지능 모델 및 제2 인공지능 모델에 적용하여 학습시킴으로써, URL 정보를 학습시킬 수 있다. In step S129, the detection server 200 updates the determined result. The detection server 200 maintains the latest URL information by additionally updating the determined result in the previously stored URL information. In addition, the detection server 200 may learn URL information by applying the determined malicious or not malicious URL to the first artificial intelligence model and the second artificial intelligence model.

S131 단계에서, 탐지 서버(200)는 판단된 결과를 사용자 단말에 전송한다. 여기서 판단된 결과가 URL이 정상이면 S133 단계를 수행하고, URL이 악성이면 S135 단계를 수행하며, URL이 새로운 URL이면 S115 단계를 수행한다.In step S131, the detection server 200 transmits the determined result to the user terminal. As a result of the determination here, if the URL is normal, step S133 is performed, if the URL is malicious, step S135 is performed, and if the URL is a new URL, step S115 is performed.

S133 단계에서, 사용자 단말(100)은 URL을 최종 목적지인 외부 서버(400)의 웹 포탈로 접속한다.In step S133, the user terminal 100 connects the URL to the web portal of the external server 400 as the final destination.

S135 단계에서, 사용자 단말(100)은 URL을 차단한다. 사용자 단말(100)은 해당 URL의 접속을 미연에 방지하기 위해 접속 차단을 한다.In step S135, the user terminal 100 blocks the URL. The user terminal 100 blocks access to prevent access to the corresponding URL in advance.

본 발명의 실시 예에 따른 방법은 컴퓨터 프로그램 명령어와 데이터를 저장하기에 적합한 컴퓨터로 판독 가능한 매체의 형태로 제공될 수도 있다. 이러한, 컴퓨터가 읽을 수 있는 기록매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있으며, 컴퓨터 시스템에 의해 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(Magnetic Media), CD-ROM(Compact Disk Read Only Memory), DVD(Digital Video Disk)와 같은 광기록 매체(Optical Media), 플롭티컬 디스크(Floptical Disk)와 같은 자기-광 매체(Magneto-Optical Media) 및 롬(ROM, Read Only Memory), 램(RAM, Random Access Memory), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함한다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고, 본 발명을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있다.The method according to an embodiment of the present invention may be provided in the form of a computer readable medium suitable for storing computer program instructions and data. Such a computer-readable recording medium may include program commands, data files, data structures, etc. alone or in combination, and includes all types of recording devices storing data that can be read by a computer system. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs (Compact Disk Read Only Memory) and DVDs (Digital Video Disks). Optical media), magneto-optical media such as floptical disks, and program instructions such as ROM (Read Only Memory), RAM (RAM, Random Access Memory), flash memory, etc. and a hardware device specially configured to do so. In addition, the computer-readable recording medium is distributed in computer systems connected through a network, so that computer-readable codes can be stored and executed in a distributed manner. In addition, functional programs, codes, and code segments for implementing the present invention can be easily inferred by programmers in the technical field to which the present invention belongs.

이상으로 본 발명의 기술적 사상을 예시하기 위한 바람직한 실시예와 관련하여 설명하고 도시하였지만, 본 발명은 이와 같이 도시되고 설명된 그대로의 구성 및 작용에만 국한되는 것은 아니며, 기술적 사상의 범주를 이탈함없이 본 발명에 대해 다수의 변경 및 수정이 가능함을 당업자들은 잘 이해할 수 있을 것이다. 따라서 그러한 모든 적절한 변경 및 수정과 균등물들도 본 발명의 범위에 속하는 것으로 간주되어야 할 것이다. Although the above has been described and illustrated in relation to preferred embodiments for illustrating the technical idea of the present invention, the present invention is not limited to the configuration and operation as shown and described in this way, without departing from the scope of the technical idea. It will be readily apparent to those skilled in the art that many changes and modifications can be made to the present invention. Accordingly, all such appropriate changes and modifications and equivalents should be regarded as falling within the scope of the present invention.

10: 통신부
30: 제어부
31: URL 탐지부
33: 데이터 수집부
35: URL 차단부
50: 출력부
70: 저장부
100: 사용자 단말
200: 탐지 서버
210: 서버 통신부
230: 서버 제어부
231: 데이터 관리부
233: 데이터 분석부
250: 서버 저장부
300: 웹 포털 서버
400: 외부 서버
500: 악성 URL 탐지 시스템
550: 통신망10: Ministry of Communications
30: control unit
31: URL detection unit
33: data collection unit
35: URL blocking unit
50: output unit
70: storage unit
100: user terminal
200: detection server
210: server communication unit
230: server control unit
231: data management unit
233: data analysis unit
250: server storage unit
300: web portal server
400: external server
500: Malicious URL detection system
550: communication network

Claims

Detecting, by a user terminal, whether a URL (uniform resource locator) is included in the received text data;
When the URL is included in the text data, the user terminal separates the text and the URL, and if the separated URL is a shortened URL, the user terminal expands the URL and restores the original URL, and first detection data related to the restored URL. Pre-processing the first data to collect;
Transmitting, by the user terminal, the first data pre-processed information and a URL state determination request message to a detection server;
determining, by the detection server, a state of a URL by comparing information received from the user terminal with pre-stored URL information; and
Transmitting, by the detection server, the determined result to the user terminal;
Malicious URL detection method using artificial intelligence technology including.

According to claim 1,
accessing the URL by the user terminal when the determined result is normal; and
blocking the URL by the user terminal when the determined result is malicious;
Malicious URL detection method using artificial intelligence technology, characterized in that it further comprises.

According to claim 1,
second data pre-processing in which the user terminal additionally collects second detection data different from the first detection data using a web portal when the determined result is determined to be a new URL;
transmitting, by the user terminal, the text data, the first data preprocessed information, the second data preprocessed information, and a URL state determination re-request message to a detection server;
Third data pre-processing step of the detection server pre-processing the text included in the information received from the user terminal;
The detection server performs statistical analysis and second artificial intelligence model based on the first data preprocessed information, the second data preprocessed information, the third data preprocessed information, and learning information learned by the first artificial intelligence model. judging the state of the URL by performing analysis through; and
transmitting, by the detection server, the judged result to the user terminal;
Malicious URL detection method using artificial intelligence technology, characterized in that it further comprises.

According to claim 3,
learning, by the detection server, text-related information to the first artificial intelligence model; and
learning, by the detection server, URL-related information to the second artificial intelligence model;
Malicious URL detection method using artificial intelligence technology, characterized in that it further comprises.

According to claim 3,
accessing the URL by the user terminal when the judged result is normal; and
blocking the URL by the user terminal when the judged result is malicious;
Malicious URL detection method using artificial intelligence technology, characterized in that it further comprises.