KR102510777B1

KR102510777B1 - System and method for detecting counterfeit and falsification based on Artificial Intelligence

Info

Publication number: KR102510777B1
Application number: KR1020220060869A
Authority: KR
Inventors: 김대엽; 신삼신; 지승구
Original assignee: 한국인터넷진흥원
Priority date: 2022-05-18
Filing date: 2022-05-18
Publication date: 2023-03-16
Also published as: WO2023224202A1

Abstract

The present invention relates to an artificial intelligence-based website alteration detection system and method. More specifically, provided are the artificial intelligence-based website alteration detection system and method capable of determining whether there is an alteration of a website based on artificial intelligence. The artificial intelligence-based website alteration detection system includes a data collection unit and a website alteration detection unit.

Description

Artificial intelligence based website alteration detection system and method {System and method for detecting counterfeit and falsification based on Artificial Intelligence}

본 발명은 인공지능 기반 웹사이트 변조 탐지 시스템 및 방법에 관한 것으로, 더욱 상세하게는 인공지능 기반으로 웹사이트의 변조 여부를 판단할 수 있는 인공지능 기반 웹사이트 변조 탐지 시스템 및 방법에 관한 것이다.The present invention relates to an artificial intelligence-based website tampering detection system and method, and more particularly, to an artificial intelligence-based website tampering detection system and method capable of determining whether or not a website has been tampered with based on artificial intelligence.

웹사이트　위변조　공격방법으로는 XSS(Cross Site Scripting) 공격, CSRF(Cross Site Request Forgery) 공격, 피싱(Phishing) 및 파밍(Pharming) 공격, hosts 파일　변조　공격, DBD(Drive By Download)공격, MITM(Man In The Middle) 공격 등이 있다.Website 　Forgery　attack methods include XSS (Cross Site Scripting) attack, CSRF (Cross Site Request Forgery) attack, phishing and pharming attack, hosts file 　tampering　attack, DBD (Drive By Download) attack, MITM ( Man in the Middle) attacks.

그 중 피싱(Phishing) 및 파밍(Pharming) 공격은, 실제의 은행 사이트와 유사한 피싱(Phishing)사이트를 이용한 파밍(Pharming)공격으로 이용자로부터 금융거래가 가능한 개인정보를 입력하도록 유도하여 피해가 발생하는 방법이다.Among them, phishing and pharming attacks are pharming attacks using phishing sites similar to actual bank sites, and damage is caused by inducing users to enter personal information that can be used for financial transactions. am.

피싱(Phishing)공격은 개인정보(Private data)와 낚시(Fishing)의 합성어로 공공기관 및 금융기관으로 사칭하여 이메일을 발송하고 가짜은행사이트로의 접속을 유도해 금융정보를 탈취하는 공격이다.Phishing attack is a compound word of private data and fishing. It is an attack that steals financial information by impersonating a public or financial institution by sending an e-mail and inducing access to a fake bank website.

파밍(Pharming)공격은 악성코드를 통해 감염된 사용자 PC를 조작하여 금융정보를 빼내는 방법으로 피해자가 정상적인 사이트에 접속을 시도하더라도 위·변조된 피싱사이트로 접속이 유도되며 각종 금융정보를 탈취하는 공격이다.Pharming attack is a method of stealing financial information by manipulating an infected user's PC through malicious code. Even if the victim tries to access a normal site, the victim is induced to access a forged or altered phishing site and steals various financial information. .

이와 같은 위·변조된 피싱사이트들은 실제 사이트와 유사하기 때문에 사용자가 사이트의 위·변조　여부를 직접 판단하기 어려움이 있어, 이를 해결해줄 시스템의 필요성이 대두되고 있다.Since these forged/altered phishing sites are similar to actual sites, it is difficult for users to directly determine whether or not the site has been forged/altered, and the need for a system to solve this problem is emerging.

본 발명은 상기와 같은 필요성에 의해 창출된 것으로, 본 발명의 목적은 인공지능 기반으로 웹사이트의 변조 여부를 판단할 수 있는 인공지능 기반 웹사이트 변조 탐지 시스템 및 방법을 제공하는데 있다.The present invention has been created due to the above necessity, and an object of the present invention is to provide an artificial intelligence-based website tampering detection system and method capable of determining whether or not a website has been tampered with based on artificial intelligence.

상기와 같은 목적을 달성하기 위한, 본 발명에 따른 인공지능 기반 웹사이트 변조 탐지 시스템은 탐지 대상인 웹사이트(200)로부터 웹사이트 식별정보를 수집하고, 변조 정보 제공 서버(300)로부터 웹사이트 변조정보를 수집하는 데이터 수집부(110); 및 상기 데이터 수집부(110)가 수집한 웹사이트 식별정보 및 웹사이트 변조정보를 통해 상기 탐지 대상인 웹사이트(200)의 변조 여부를 판단하는 웹사이트 변조탐지부(120)를 포함하는 것을 특징으로 한다.In order to achieve the above object, the artificial intelligence-based website tampering detection system according to the present invention collects website identification information from the website 200 to be detected, and website tampering information from the tampering information providing server 300. Data collection unit 110 for collecting; and a website tampering detection unit 120 that determines whether or not the website 200 to be detected is tampered with through the website identification information and the website tampering information collected by the data collection unit 110. do.

본 발명의 바람직한 실시예에 따르면, 상기 데이터 수집부(110)는, 상기 탐지 대상인 웹사이트(200)로부터 Title, Logo, Favicon, Keyword, Copyright, Footer Link 중 어느 하나를 포함하는 웹사이트 식별정보를 추출하여 수집하는 것을 특징으로 한다.According to a preferred embodiment of the present invention, the data collection unit 110 receives website identification information including any one of Title, Logo, Favicon, Keyword, Copyright, and Footer Link from the detection target website 200. It is characterized by extraction and collection.

본 발명의 바람직한 실시예에 따르면, 상기 웹사이트 변조탐지부(120)는, 상기 웹사이트 변조정보를 바탕으로 미리 저장된 알고리즘을 이용하여 인공지능 학습 모델을 생성하는 인공지능 학습부(121); 및 상기 웹사이트 식별정보를 입력변수로 하여 상기 인공지능 학습 모델을 이용하여 웹사이트(200)의 변조 여부를 판단하는 웹사이트 변조판단부(122)를 포함하는 것을 특징으로 한다.According to a preferred embodiment of the present invention, the website alteration detection unit 120 includes an artificial intelligence learning unit 121 for generating an artificial intelligence learning model using a pre-stored algorithm based on the website alteration information; and a website tampering determination unit 122 for determining whether the website 200 has been tampered with using the artificial intelligence learning model by using the website identification information as an input variable.

본 발명의 바람직한 실시예에 따르면, 상기 인공지능 학습부(121)는, 미리 저장된 알고리즘을 이용하여 상기 웹사이트 변조정보로부터 특징정보를 추출하는 전처리부(121a); 미리 저장된 알고리즘을 이용하여 상기 특징정보를 바탕으로 학습을 수행하여 복수의 인공지능　학습 모델을 생성하는 학습 모델 생성부(121b); 및 상기 복수의 인공지능 학습 모델 중 미리 설정된 분류 정확도를 충족하는 학습 모델을 최종 인공지능 학습 모델로 선택하는 검증부(121c)를 포함하는 것을 특징으로 한다.According to a preferred embodiment of the present invention, the artificial intelligence learning unit 121 includes a pre-processing unit 121a for extracting feature information from the website modulation information using a pre-stored algorithm; a learning model generation unit 121b for generating a plurality of artificial intelligence/learning models by performing learning based on the feature information using a pre-stored algorithm; and a verification unit 121c for selecting a learning model that meets a preset classification accuracy among the plurality of artificial intelligence learning models as a final artificial intelligence learning model.

본 발명의 바람직한 실시예에 따르면, 상기 인공지능 학습 모델은, 상기 웹사이트 식별정보를 검색엔진에 질의하여 질의결과로 검색된 도메인을 추출하고, 상기 추출된 도메인을 탐지 대상인 웹사이트(200)의 도메인과 일대일(one to one) 또는 일대다(one to many) 또는 다대다(many to many) 매칭하여 일치여부에 따라 도메인 매칭 결과 정보를 생성하여 변조 정보 제공 서버(300) 또는 사용자 단말(400)에 전송하는 것을 특징으로 한다.According to a preferred embodiment of the present invention, the artificial intelligence learning model queries the search engine for the website identification information, extracts a searched domain as a query result, and uses the extracted domain as the domain of the website 200 to be detected. and one-to-one, one-to-many, or many-to-many matching to generate domain matching result information according to whether or not the match is made to the modulation information providing server 300 or user terminal 400. characterized by transmission.

본 발명의 바람직한 실시예에 따르면, 상기 인공지능 학습 모델은, 추출된 도메인이 탐지 대상인 웹사이트(200)의 도메인과 일치할 경우에는 정상이라고 판단하고, 일치하지 않을 경우에는 악성이라고 판단하여 도메인 매칭 결과 정보를 생성하는 것을 특징으로 한다.According to a preferred embodiment of the present invention, the artificial intelligence learning model determines that the extracted domain is normal if it matches the domain of the website 200 to be detected, and determines that it is malicious if it does not match, and matches the domain. Characterized in generating result information.

본 발명의 바람직한 실시예에 따르면, 상기 검색엔진은, 구글(Google), 야후(YaHoo), 네이버(Naver), 빙(Bing), 줌(Zum), 바이두(Baidu) 및 다음(Daum) 중 어느 하나를 포함하는 것을 특징으로 한다.According to a preferred embodiment of the present invention, the search engine is any one of Google, Yahoo, Naver, Bing, Zum, Baidu, and Daum. It is characterized by including one.

상기와 같은 목적을 달성하기 위한, 본 발명에 따른 인공지능 기반 웹사이트 변조 탐지 시스템을 이용한 웹사이트 변조 탐지 방법은 데이터 수집부(110)가 탐지 대상인 웹사이트(200)로부터 웹사이트 식별정보를 수집하고, 변조 정보 제공 서버(300)로부터 웹사이트 변조정보를 수집하는 A단계; 및 웹사이트 변조탐지부(120)가 상기 데이터 수집부(110)가 수집한 웹사이트 식별정보 및 웹사이트 변조정보를 통해 상기 탐지 대상인 웹사이트(200)의 변조 여부를 판단하는 B단계를 포함하는 것을 특징으로 한다.In order to achieve the above object, in the website tampering detection method using the artificial intelligence-based website tampering detection system according to the present invention, the data collection unit 110 collects website identification information from the website 200 to be detected. and A step of collecting website modification information from the modification information providing server 300; And a step B in which the website tampering detection unit 120 determines whether the website 200 to be detected is tampered with through the website identification information and the website tampering information collected by the data collection unit 110 characterized by

본 발명의 바람직한 실시예에 따르면, 상기 B단계는, 인공지능 학습부(121)가 상기 웹사이트 변조정보를 바탕으로 미리 저장된 알고리즘을 이용하여 인공지능 학습 모델을 생성하는 단계; 및 웹사이트 변조판단부(122)가 상기 웹사이트 식별정보를 입력변수로 하여 상기 인공지능 학습 모델을 이용하여 웹사이트(200)의 변조 여부를 판단하는 단계를 포함하는 것을 특징으로 한다.According to a preferred embodiment of the present invention, step B may include: generating, by the artificial intelligence learning unit 121, an artificial intelligence learning model using a pre-stored algorithm based on the website modulation information; and determining, by the website falsification determining unit 122, whether the website 200 has been tampered with using the artificial intelligence learning model using the website identification information as an input variable.

본 발명의 바람직한 실시예에 따르면, 상기 인공지능 학습 모델을 생성하는 단계는, 전처리부(121a)가 미리 저장된 알고리즘을 이용하여 상기 웹사이트 변조정보로부터 특징정보를 추출하는 단계; 학습 모델 생성부(121b)가 미리 저장된 알고리즘을 이용하여 상기 특징정보를 바탕으로 학습을 수행하여 복수의 인공지능　학습 모델을 생성하는 단계; 및 검증부(121c)가 상기 복수의 인공지능 학습 모델 중 미리 설정된 분류 정확도를 충족하는 학습 모델을 최종 인공지능 학습 모델로 선택하는 단계를 포함하는 것을 특징으로 한다.According to a preferred embodiment of the present invention, the step of generating the artificial intelligence learning model may include: extracting feature information from the website modulation information by using a pre-stored algorithm in the pre-processing unit 121a; generating a plurality of artificial intelligence/learning models by performing learning based on the feature information using a pre-stored algorithm by the learning model generation unit 121b; and selecting, by the verification unit 121c, a learning model that meets a preset classification accuracy among the plurality of artificial intelligence learning models as a final artificial intelligence learning model.

본 발명에 따른 인공지능 기반 웹사이트 변조 탐지 시스템 및 방법은 인공지능을 기반으로 하여 웹사이트 변조 탐지의 정확도를 향상시킬 수 있는 효과가 있다.The artificial intelligence-based website tampering detection system and method according to the present invention has an effect of improving the accuracy of website tampering detection based on artificial intelligence.

도 1은 본 발명의 일실시예에 따른 웹사이트 변조 탐지 시스템의 블록구성도
도 2는 본 발명의 일실시예에 따른 웹사이트 변조판단부의 블록구성도
도 3은 본 발명의 일실시예에 따른 인공지능 학습부의 블록구성도
도 4는 본 발명의 일시시예에 따른 인공지능 기반 웹사이트 변조 탐지 시스템을 통한 웹사이트 변조 탐지 순서도
도 5는 본 발명의 일실시예에 따른 데이터 수집부의 웹사이트 식별정보 수집 예시도
도 6은 본 발명의 일실시예에 따른 웹사이트 변조판단부의 웹사이트 변조 여부 판단 순서도
도 7은 본 발명의 일실시예에 따른 인공지능 학습부의 인공지능 학습 모델 생성 순서도1 is a block diagram of a website alteration detection system according to an embodiment of the present invention.
2 is a block diagram of a website alteration determining unit according to an embodiment of the present invention;
3 is a block diagram of an artificial intelligence learning unit according to an embodiment of the present invention
4 is a flowchart of website tampering detection through an artificial intelligence-based website tampering detection system according to an embodiment of the present invention
5 is an exemplary view of website identification information collection by a data collection unit according to an embodiment of the present invention;
6 is a flowchart of determining whether a website has been tampered with by a website tampering determining unit according to an embodiment of the present invention.
7 is an artificial intelligence learning model generation flow chart of an artificial intelligence learning unit according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the accompanying drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily carry out the present invention. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a certain component is said to "include", it means that it may further include other components without excluding other components unless otherwise stated.

본 발명의 실시 예에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 실시 예들의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 실시 예들에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 실시 예들의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in the embodiments of the present invention have been selected from general terms that are currently widely used as much as possible while considering the functions in the present invention, but these may vary depending on the intention or precedent of a person skilled in the art, the emergence of new technologies, etc. . In addition, in a specific case, there is also a term arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the corresponding embodiment. Therefore, the term used in the present embodiments should be defined based on the meaning of the term and the overall content of the present embodiment, not a simple name of the term.

본 발명의 실시 예에서, 제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되지는 않는다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.In an embodiment of the present invention, terms including ordinal numbers such as first and second may be used to describe various components, but the components are not limited by the terms. These terms are only used for the purpose of distinguishing one component from another. For example, a first element may be termed a second element, and similarly, a second element may be termed a first element, without departing from the scope of the present invention. The terms and/or include any combination of a plurality of related recited items or any of a plurality of related recited items.

또한, 본 발명의 실시 예에서, 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다.Also, in the embodiments of the present invention, singular expressions include plural expressions unless the context clearly indicates otherwise.

또한, 본 발명의 실시 예에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In addition, in the embodiments of the present invention, terms such as "include" or "have" are intended to designate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, but one or other features, numbers, steps, operations, components, parts, or combinations thereof, or any combination thereof, is not precluded from being excluded in advance.

또한, 본 발명의 실시 예에서, '모듈' 혹은 '부'는 적어도 하나의 기능이나 동작을 수행하며, 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다. 또한, 복수의'모듈' 혹은 복수의'부'는 특정한 하드웨어로 구현될 필요가 있는 '모듈' 혹은 '부'를 제외하고는 적어도 하나의 모듈로 일체화되어 적어도 하나의 프로세서로 구현될 수 있다. 이하, 도면들을 참조하여 본 발명의 몇몇 실시예들을 설명한다.Also, in an embodiment of the present invention, a 'module' or 'unit' performs at least one function or operation, and may be implemented as hardware or software, or a combination of hardware and software. In addition, a plurality of 'modules' or a plurality of 'units' may be integrated into at least one module and implemented by at least one processor, except for 'modules' or 'units' that need to be implemented with specific hardware. Hereinafter, some embodiments of the present invention will be described with reference to the drawings.

도 1은 본 발명의 일실시예에 따른 인공지능 기반 웹사이트 변조 탐지 시스템의 블록구성도이며, 도 2는 본 발명의 일실시예에 따른 웹사이트 변조판단부의 블록구성도이며, 도 3은 본 발명의 일실시예에 따른 인공지능 학습부의 블록구성도이다. 도 1 내지 3에 도시된 바와 같이, 본 발명에 따른 인공지능 기반 웹사이트 변조 탐지 시스템은 웹사이트 변조 탐지 장치(100), 웹사이트(200), 변조 정보 제공 서버(300) 및 사용자 단말(400)로 구성된다.1 is a block diagram of an artificial intelligence-based website alteration detection system according to an embodiment of the present invention, FIG. 2 is a block diagram of a website alteration determining unit according to an embodiment of the present invention, and FIG. It is a block configuration diagram of an artificial intelligence learning unit according to an embodiment of the invention. 1 to 3, the artificial intelligence-based website alteration detection system according to the present invention includes a website alteration detection device 100, a website 200, a alteration information providing server 300 and a user terminal 400. ) is composed of

각 구성요소(100, 200, 300 또는 400)는 적어도 하나의 프로세서를 포함할 수 있다. 적어도 하나의 프로세서는 ASIC(Application Specific Integrated Circuit), DSP(Digital Signal Processor), PLD(Programmable Logic Device), FPGA(Field Programmable Gate Array), CPU(Central Processing unit), 마이크로컨트롤러(microcontroller) 및/또는 마이크로프로세서(microprocessor) 등으로 구현될 수 있다. 각 구성요소(100, 200, 300 또는 400)는 메모리를 더 포함할 수 있다. 메모리는 플래시 메모리(flash memory), 하드디스크(hard disk), SSD(Solid State Disk), RAM(Random Access Memory), SRAM(Static Random Access Memory), ROM(Read Only Memory), PROM(Programmable Read Only Memory), EEPROM(Electrically Erasable and Programmable ROM), EPROM(Erasable and Programmable ROM) 및/또는 eMMC(embedded multimedia card) 등과 같은 저장매체를 포함할 수 있다.Each component 100, 200, 300 or 400 may include at least one processor. The at least one processor may include an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a Central Processing Unit (CPU), a microcontroller, and/or It may be implemented with a microprocessor or the like. Each component 100, 200, 300 or 400 may further include a memory. Memory includes flash memory, hard disk, solid state disk (SSD), random access memory (RAM), static random access memory (SRAM), read only memory (ROM), and programmable read only (PROM) memory. memory), electrically erasable and programmable ROM (EEPROM), erasable and programmable ROM (EPROM), and/or embedded multimedia card (eMMC).

또한 각 구성요소(100, 200, 300 또는 400)는 네트워크를 통해 연결될 수 있는데, 여기서, 네트워크는, 복수의 단말 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 이러한 네트워크의 일 예에는 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷(WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. 무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), 5GPP(5th Generation Partnership Project), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), RF(Radio Frequency), 블루투스(Bluetooth) 네트워크, NFC(Near-Field Communication) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함되나 이에 한정되지는 않는다.In addition, each component (100, 200, 300 or 400) can be connected through a network, where the network means a connection structure capable of exchanging information between each node, such as a plurality of terminals and servers, Examples of such networks include a local area network (LAN), a wide area network (WAN), a World Wide Web (WWW), a wired and wireless data communication network, a telephone network, a wired and wireless television communication network, and the like. Examples of wireless data communication networks include 3G, 4G, 5G, 3rd Generation Partnership Project (3GPP), 5th Generation Partnership Project (5GPP), Long Term Evolution (LTE), World Interoperability for Microwave Access (WIMAX), Wi-Fi , Internet (Internet), LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network), PAN (Personal Area Network), RF (Radio Frequency), Bluetooth (Bluetooth) network, NFC ( A Near-Field Communication (Near-Field Communication) network, a satellite broadcasting network, an analog broadcasting network, a Digital Multimedia Broadcasting (DMB) network, etc. are included, but not limited thereto.

상기 웹사이트 변조 탐지 장치(100)는 탐지 대상인 웹사이트(200)로부터 웹사이트 식별정보를 수집하고, 변조 정보 제공 서버(300)로부터 웹사이트 변조정보를 수집하는 데이터 수집부(110) 및 상기 데이터 수집부(110)가 수집한 웹사이트 식별정보 및 웹사이트 변조정보를 통해 상기 탐지 대상인 웹사이트(200)의 변조 여부를 판단하는 웹사이트 변조탐지부(120)를 포함하여 구성된다.The website falsification detection device 100 includes a data collection unit 110 that collects website identification information from a website 200 as a detection target and website falsification information from a tamper information providing server 300 and the data It is configured to include a website alteration detection unit 120 that determines whether or not the website 200 to be detected has been tampered with through the website identification information and the website tampering information collected by the collection unit 110 .

또한 아울러, 상기 웹사이트 변조탐지부(120)는, 상기 웹사이트 변조정보를 바탕으로 미리 저장된 알고리즘을 이용하여 인공지능 학습 모델을 생성하는 인공지능 학습부(121) 및 상기 웹사이트 식별정보를 입력변수로 하여 상기 인공지능 학습 모델을 이용하여 웹사이트(200)의 변조 여부를 판단하는 웹사이트 변조판단부(122)를 더 포함하여 구성된다.In addition, the website alteration detection unit 120 inputs the artificial intelligence learning unit 121 generating an artificial intelligence learning model using a pre-stored algorithm based on the website alteration information and the website identification information. It is configured to further include a website tampering determination unit 122 that determines whether or not the website 200 has been tampered with using the artificial intelligence learning model as a variable.

또한 아울러, 상기 인공지능 학습부(121)는, 미리 저장된 알고리즘을 이용하여 상기 웹사이트 변조정보로부터 특징정보를 추출하는 전처리부(121a)와 미리 저장된 알고리즘을 이용하여 상기 특징정보를 바탕으로 학습을 수행하여 복수의 인공지능　학습 모델을 생성하는 학습 모델 생성부(121b) 및 상기 복수의 인공지능 학습 모델 중 미리 설정된 분류 정확도를 충족하는 학습 모델을 최종 인공지능 학습 모델로 선택하는 검증부(121c)를 더 포함하여 구성된다.In addition, the artificial intelligence learning unit 121 uses a pre-processing unit 121a for extracting feature information from the website modulation information using a pre-stored algorithm and learning based on the feature information using a pre-stored algorithm. A learning model generator 121b for generating a plurality of artificial intelligence learning models by performing a learning model and a verification unit 121c for selecting a learning model that meets a preset classification accuracy among the plurality of artificial intelligence learning models as a final artificial intelligence learning model It is configured to further include.

이하에서는 상기와 같이 구성되는 본 발명에 따른 인공지능 기반 웹사이트 변조 탐지 시스템을 통한 웹사이트 변조 탐지 방법에 대해서 도 4 내지 7을 통해서 상세하게 설명하도록 한다.Hereinafter, a website tampering detection method through the artificial intelligence-based website tampering detection system according to the present invention configured as described above will be described in detail with reference to FIGS. 4 to 7 .

도 4는 본 발명의 일시시예에 따른 인공지능 기반 웹사이트 변조 탐지 시스템을 통한 웹사이트 변조 탐지 방법 순서도이다. 도 4에 도시된 바와 같이, 본 발명에 따른 인공지능 기반 웹사이트 변조 탐지 시스템을 통한 웹사이트 변조 탐지 방법은 데이터 수집부(110)가 탐지 대상인 웹사이트(200)로부터 웹사이트 식별정보를 수집하고, 변조 정보 제공 서버(300)로부터 웹사이트 변조정보를 수집하는 단계(S100)와 웹사이트 변조탐지부(120)가 상기 데이터 수집부(110)가 수집한 웹사이트 식별정보 및 웹사이트 변조정보를 통해 상기 탐지 대상인 웹사이트(200)의 변조 여부를 판단하는 단계(S200)로 이루어진다.4 is a flowchart of a method for detecting website tampering through an artificial intelligence-based website tampering detection system according to an embodiment of the present invention. As shown in FIG. 4, in the website tampering detection method through the artificial intelligence-based website tampering detection system according to the present invention, the data collection unit 110 collects website identification information from the website 200 to be detected, In the step of collecting website alteration information from the alteration information providing server 300 (S100), the website alteration detection unit 120 uses the website identification information and website alteration information collected by the data collection unit 110 It consists of a step (S200) of determining whether the website 200, which is the detection target, has been tampered with.

상기 단계(S100)에서, 상기 데이터 수집부(110)는 탐지 대상인 웹사이트(200)로부터 파싱(parsing)을 수행하여 웹사이트 식별정보를 추출하거나, 미리 저장된 로그인 계정정보를 바탕으로 로그인 권한으로 웹사이트 식별정보를 수집하거나, 웹크롤러(web crawler)를 이용하여 탑 페이지(top page)부터 서브 페이지(sub page)까지 반복적으로 순회하며 웹사이트 식별정보를 수집할 수 있다.In the above step (S100), the data collection unit 110 extracts website identification information by parsing from the website 200 to be detected, or the web site with login authority based on pre-stored login account information. Site identification information may be collected, or website identification information may be collected by repeatedly traversing from a top page to sub pages using a web crawler.

즉, 상기한 방법들을 통해 상기 데이터 수집부(110)는 상기 탐지 대상인 웹사이트(200)로부터 Title, Logo, Favicon, Keyword, Copyright, Footer Link 중 어느 하나를 포함하는 웹사이트 식별정보를 추출하여 수집하는데, 이는 도 5와 같다. 도 5는 본 발명의 일실시예에 따른 데이터 수집부의 웹사이트 식별정보 수집 예시도이다. 도시된 바와 같이, 상기 데이터 수집부(110)는 도 5a와 같이 Title을 추출하여 수집하고, 도 5b와 같이 Logo, Favicon을 추출하여 수집하고, 도 5c와 같이 FooterLink를 추출하여 수집하고, 도 5d와 같이 Copyright를 추출하여 수집한다.That is, through the above methods, the data collection unit 110 extracts and collects website identification information including any one of Title, Logo, Favicon, Keyword, Copyright, and Footer Link from the detection target website 200. , which is the same as in FIG. 5 is an exemplary view of website identification information collection by a data collection unit according to an embodiment of the present invention. As shown, the data collection unit 110 extracts and collects Title as shown in FIG. 5A, extracts and collects Logo and Favicon as shown in FIG. 5B, extracts and collects FooterLink as shown in FIG. 5C, and extracts and collects FooterLink as shown in FIG. 5D. Copyright is extracted and collected as follows.

다음으로 상기 단계(S200)에 대해 좀 더 상세히 설명하면, 도 6과 같다. 도 6은 본 발명의 일실시예에 따른 웹사이트 변조판단부의 웹사이트 변조 여부 판단 순서도이다. 도 6에 도시된 바와 같이, 상기 단계(S200)는 먼저 인공지능 학습부(121)가 상기 웹사이트 변조정보를 바탕으로 미리 저장된 알고리즘을 이용하여 인공지능 학습 모델을 생성하는 단계(S210)를 포함한다.Next, the step (S200) will be described in more detail as shown in FIG. 6. 6 is a flowchart illustrating a website tampering determination unit according to an embodiment of the present invention. As shown in FIG. 6, the step (S200) first includes a step (S210) of the artificial intelligence learning unit 121 generating an artificial intelligence learning model using a pre-stored algorithm based on the website modulation information. do.

상기 단계(S210)에 대해 좀 더 상세히 설명하면, 도 7과 같다. 도 7은 본 발명의 일실시예에 따른 인공지능 학습부의 인공지능 학습 모델 생성 순서도이다. 도 7에 도시된 바와 같이, 상기 단계(S210)는 전처리부(121a)가 미리 저장된 알고리즘을 이용하여 상기 웹사이트 변조정보로부터 특징정보를 추출하는 단계(S211)와 학습 모델 생성부(121b)가 미리 저장된 알고리즘을 이용하여 상기 특징정보를 바탕으로 학습을 수행하여 복수의 인공지능　학습 모델을 생성하는 단계(S212) 및 검증부(121c)가 상기 복수의 인공지능 학습 모델 중 미리 설정된 분류 정확도를 충족하는 학습 모델을 최종 인공지능 학습 모델로 선택하는 단계(S213)로 이루어진다.A more detailed description of the step S210 is shown in FIG. 7 . 7 is an artificial intelligence learning model creation flowchart of an artificial intelligence learning unit according to an embodiment of the present invention. As shown in FIG. 7, in the step (S210), the pre-processing unit 121a extracts feature information from the website modulation information using a pre-stored algorithm (S211) and the learning model generating unit 121b Performing learning based on the feature information using a pre-stored algorithm to generate a plurality of artificial intelligence learning models (S212) and the verification unit 121c satisfies the preset classification accuracy among the plurality of artificial intelligence learning models. It consists of a step (S213) of selecting the learning model to be the final artificial intelligence learning model.

상기 단계(S211)에서, 상기 전처리부(121a)는 미리 저장된 알고리즘을 이용하여 상기 웹사이트 변조정보로부터 특징정보를 추출하는데, 상기 미리 저장된 알고리즘은 OpenCV, Pillow, Scikit-image, OCR(Optical Caracter Recognition), Logo)일 수 있다.In the step S211, the pre-processing unit 121a extracts feature information from the website modulation information using a pre-stored algorithm. The pre-stored algorithm is OpenCV, Pillow, Scikit-image, OCR (Optical Caracter Recognition ), Logo).

다음으로 상기 단계(S212)에서, 학습 모델 생성부(121b)가 미리 저장된 알고리즘을 이용하여 상기 특징정보를 바탕으로 학습을 수행하여 복수의 인공지능　학습 모델을 생성하는데, 상기 미리 저장된 알고리즘은 머신 러닝(Machine Learning) 학습 알고리즘 또는 딥러닝(Deep Learning) 학습 알고리즘일 수 있다. 딥러닝(Deep Learning) 학습 알고리즘은 한 실시예로서, 합성곱 신경망(Convolution Neural Network, CNN) 알고리즘일 수 있다.Next, in the step S212, the learning model generating unit 121b performs learning based on the feature information using a pre-stored algorithm to generate a plurality of artificial intelligence/learning models. The pre-stored algorithm is machine learning (Machine Learning) learning algorithm or Deep Learning (Deep Learning) learning algorithm. As an example, the deep learning learning algorithm may be a Convolution Neural Network (CNN) algorithm.

마지막으로 상기 단계(S213)에서, 검증부(121c)가 상기 복수의 인공지능 학습 모델 중 미리 설정된 분류 정확도를 충족하는 학습 모델을 최종 인공지능 학습 모델로 선택하는데, 미리 저장된 알고리즘을 이용하여 라벨이 표시된(Labeled) 학습데이터세트를 바탕으로 학습을 수행하고, 미리 설정된 분류 정확도를 충족하는 학습 모델을 최종 인공지능 학습 모델로 선택할 수 있다.Finally, in the step S213, the verification unit 121c selects a learning model that meets a preset classification accuracy among the plurality of artificial intelligence learning models as the final artificial intelligence learning model, using a pre-stored algorithm. Learning is performed based on the labeled training dataset, and a learning model that meets a preset classification accuracy can be selected as the final artificial intelligence learning model.

또한 상기 단계(S200)는 웹사이트 변조판단부(122)는 상기 웹사이트 식별정보를 입력변수로 하여 상기한 단계(S211~S213)를 통해 생성된 상기 인공지능 학습 모델을 이용하여 웹사이트(200)의 변조 여부를 판단하는 단계(S220)를 더 포함한다.In addition, in the step (S200), the website alteration determination unit 122 takes the website identification information as an input variable and uses the artificial intelligence learning model generated through the above steps (S211 to S213) to determine the website (200). ) Further comprising a step (S220) of determining whether modulation is performed.

상기 단계(S220)에서, 상기 인공지능 학습모델은 상기 웹사이트 식별정보를 검색엔진에 질의하여 질의결과로 검색된 도메인을 추출하고, 상기 추출된 도메인을 탐지 대상인 웹사이트(200)의 도메인과 일대일(one to one) 또는 일대다(one to many) 또는 다대다(many to many) 매칭하여 일치여부에 따라 도메인 매칭 결과 정보를 생성한다.In the step (S220), the artificial intelligence learning model queries the search engine for the website identification information to extract the searched domain as a query result, and the extracted domain is one-to-one with the domain of the website 200 to be detected ( One-to-one, one-to-many, or many-to-many matching is performed, and domain matching result information is generated according to matching.

상기 검색엔진은 구글(Google), 야후(YaHoo), 네이버(Naver), 빙(Bing), 줌(Zum), 바이두(Baidu) 및 다음(Daum) 중 어느 하나를 포함할 수 있다.The search engine may include any one of Google, Yahoo, Naver, Bing, Zum, Baidu, and Daum.

상기 인공지능 학습 모델은, 추출된 도메인이 탐지 대상인 웹사이트(200)의 도메인과 일치할 경우에는 정상이라고 판단하고, 일치하지 않을 경우에는 악성이라고 판단하여 도메인 매칭 결과 정보를 생성하여 변조 정보 제공 서버(300) 또는 사용자 단말(400)에 전송하는데, 상기 변조 정보 제공 서버(300)로 제공된 도메인 매칭 결과 정보는 웹사이트 변조정보로써 인공지능 학습모델을 생성하는데 사용될 수 있다.The artificial intelligence learning model determines that the extracted domain is normal when it matches the domain of the website 200 to be detected, and determines that it is malicious when it does not match, and generates domain matching result information to provide tamper information server. 300 or the user terminal 400, the domain matching result information provided to the modulation information providing server 300 can be used to create an artificial intelligence learning model as website modulation information.

따라서, 상기한 바와 같이 본 발명에 따른 인공지능 기반 웹사이트 변조 탐지 시스템은 웹사이트의 변조를 탐지하는데 인공지능을 이용함으로써 변조 탐지의 정확도를 향상시킬 수 있다.Therefore, as described above, the artificial intelligence-based website tampering detection system according to the present invention can improve the accuracy of tampering detection by using artificial intelligence to detect tampering of a website.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements made by those skilled in the art using the basic concept of the present invention defined in the following claims are also included in the scope of the present invention. that fall within the scope of the right.

본 실시 예와 관련된 기술 분야에서 통상의 지식을 가진 자는 상기된 기재의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시 방법들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.Those skilled in the art related to this embodiment will be able to understand that it can be implemented in a modified form within a range that does not deviate from the essential characteristics of the above description. Therefore, the disclosed methods are to be considered in an illustrative rather than a limiting sense. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the equivalent scope will be construed as being included in the present invention.

100 : 웹사이트 변조 탐지 장치
110 : 데이터 수집부
120 : 웹사이트 변조탐지부
121 : 인공지능 학습부
121a : 전처리부
121b : 학습 모델 생성부
121c : 검증부
122 : 웹사이트 변조판단부
200 : 웹사이트
300 : 변조 정보 제공 서버
400 : 사용자 단말100: website alteration detection device
110: data collection unit
120: website alteration detection unit
121: artificial intelligence learning unit
121a: pre-processing unit
121b: learning model generation unit
121c: verification unit
122: website alteration judgment unit
200: website
300: Modulation information providing server
400: user terminal

Claims

a data collection unit 110 that collects website identification information from the detection target website 200 and website alteration information from the alteration information providing server 300; and
a website falsification detection unit 120 that determines whether or not the website 200, which is the detection target, has been falsified through the website identification information and website tampering information collected by the data collection unit 110;
Including,
Parsing is performed from the detection target website 200 to extract website identification information, collect website identification information with login authority based on pre-stored login account information, or use a web crawler Website identification information including any one of Title, Logo, Favicon, Keyword, Copyright, Footer Link Extract and collect,
The website alteration detection unit 120,
An artificial intelligence learning unit 121 generating an artificial intelligence learning model using a pre-stored algorithm based on the website modulation information; and
a website tampering determination unit 122 for determining whether the website 200 has been tampered with using the artificial intelligence learning model by using the website identification information as an input variable;
Including,
The artificial intelligence learning unit 121,
A pre-processing unit 121a for extracting feature information from the website modulation information using any one of pre-stored algorithms such as OpenCV, Pillow, Scikit-image, OCR (Optical Caracter Recognition), and Logo);
A learning model generator for generating a plurality of artificial intelligence learning models by performing learning based on the feature information using either a machine learning learning algorithm or a deep learning algorithm, which is a pre-stored algorithm. (121b); and
A verification unit 121c that performs learning based on a labeled learning dataset using a pre-stored algorithm and selects a learning model that meets a preset classification accuracy as a final artificial intelligence learning model;
Artificial intelligence-based website tampering detection system comprising a.

delete

According to claim 1,
The artificial intelligence learning model,
A search engine is queried for the website identification information to extract a domain searched as a query result, and the extracted domain is one to one or one to many with the domain of the website 200 to be detected. An artificial intelligence-based website tampering detection system characterized in that many to many matching is performed, domain matching result information is generated according to the match, and transmitted to the falsification information providing server 300 or the user terminal 400.

According to claim 5,
The artificial intelligence learning model,
If the extracted domain matches the domain of the website 200 to be detected, it is determined to be normal, and if it does not match, it is determined to be malicious and domain matching result information is generated. system.

According to claim 5,
The search engine,
Artificial intelligence-based website modification characterized by including any one of Google, Yahoo, Naver, Bing, Zum, Baidu and Daum detection system.

A step in which the data collection unit 110 collects website identification information from the detection target website 200 and website alteration information from the alteration information providing server 300; and
Step B in which the website alteration detection unit 120 determines whether or not the detection target website 200 is tampered with through the website identification information and the website tampering information collected by the data collection unit 110;
Including,
The data collection unit 110,
Parsing is performed from the detection target website 200 to extract website identification information, collect website identification information with login authority based on pre-stored login account information, or use a web crawler Website identification information including any one of Title, Logo, Favicon, Keyword, Copyright, Footer Link Extract and collect,
In step B,
Generating, by the artificial intelligence learning unit 121, an artificial intelligence learning model using a pre-stored algorithm based on the website modulation information; and
determining whether the website 200 has been tampered with using the artificial intelligence learning model with the website identification information as an input variable by the website tampering determination unit 122;
Including,
The step of generating the artificial intelligence learning model,
Pre-processing unit 121a extracting feature information from the website modulation information using any one of pre-stored algorithms such as OpenCV, Pillow, Scikit-image, OCR (Optical Caracter Recognition), and Logo;
A plurality of artificial intelligence learning by performing learning based on the feature information using either a machine learning learning algorithm or a deep learning learning algorithm, which is a pre-stored algorithm in the learning model generator 121b creating a model; and
The verification unit 121c performs learning based on a labeled learning dataset using a pre-stored algorithm, and selects a learning model that meets a preset classification accuracy as a final artificial intelligence learning model;
A website tampering detection method using an artificial intelligence-based website tampering detection system comprising a.

delete

According to claim 8,
The artificial intelligence learning model,
A search engine is queried for the website identification information to extract a domain searched as a query result, and the extracted domain is one to one or one to many with the domain of the website 200 to be detected. Using an artificial intelligence-based website tampering detection system, characterized in that many to many matching generates domain matching result information according to the match and transmits it to the falsification information providing server 300 or the user terminal 400 How to detect website tampering.

According to claim 12,
The artificial intelligence learning model,
If the extracted domain matches the domain of the website 200 to be detected, it is determined to be normal, and if it does not match, it is determined to be malicious and domain matching result information is generated. Website tampering detection method using the system.

According to claim 12,
The search engine,
Artificial intelligence-based website modification characterized by including any one of Google, Yahoo, Naver, Bing, Zum, Baidu and Daum Website falsification detection method using detection system.