KR102671436B1

KR102671436B1 - Device, method and program for evaluating security reports based on artificial intelligence

Info

Publication number: KR102671436B1
Application number: KR1020240014780A
Authority: KR
Inventors: 김오중; 이채영; 윤선호; 한영석; 임승혁
Original assignee: 파인더갭 주식회사
Priority date: 2023-11-22
Filing date: 2024-01-31
Publication date: 2024-05-31

Abstract

본 개시는 인공지능 기반의 보안 리포트 평가 장치에 관한 것으로, 통신부를 통해 보안 리포트를 수신하고, 인공지능 모델을 이용하여 수신된 보안 리포트를 기 설정된 적어도 하나의 검사 방법을 이용하여 검사하고, 검사한 결과를 기반으로 수신된 보안 리포트의 중복 여부를 판단할 수 있다.This disclosure relates to an artificial intelligence-based security report evaluation device, which receives a security report through a communication unit, inspects the received security report using an artificial intelligence model using at least one preset inspection method, and inspects the security report. Based on the results, it can be determined whether the received security reports are duplicates.

Description

Device, method and program for evaluating security reports based on artificial intelligence}

본 개시는 보안 리포트 평가 장치에 관한 것으로, 보다 상세하게는 인공지능 모델을 이용하여 보안 리포트를 평가할 수 있는 장치에 관한 것이다.This disclosure relates to a security report evaluation device, and more specifically, to a device that can evaluate a security report using an artificial intelligence model.

버그바운티 플랫폼은 각종 참여자로부터 보안 취약점 리포트를 받아 유효성을 평가하고 해당 취약점의 심각도를 바탕으로 리워드를 지급한다.The bug bounty platform receives security vulnerability reports from various participants, evaluates their effectiveness, and provides rewards based on the severity of the vulnerability.

이때, 보안 취약점 리포트를 작성하는 참여자는 다른 참여자가 제보한 내용에 대해 알 수 없기 때문에 같은 내용의 중복 리포트를 제출하는 경우가 빈번하게 발생한다.At this time, participants writing security vulnerability reports are not aware of the content reported by other participants, so they frequently submit duplicate reports with the same content.

버그바운티 플랫폼은 보안 취약점에 대하여 제일 먼저 제보한 참여자에게 우선적으로 리워드를 제공하게 되는데, 이러한 중복 여부를 판단하기 위해 리포트를 검토하는데 많은 인력과 시간이 소모되고 있다.The bug bounty platform provides preferential rewards to participants who first report security vulnerabilities, but a lot of manpower and time are consumed in reviewing reports to determine whether there are duplicates.

이러한 점은 버그바운티 플랫폼은 물론 참여자들에게도 문제점으로 작용하고 있기 때문에, 이러한 문제점을 해결할 수 있도록 보안 리포트에 대한 중복 여부를 자동으로 검사해줄 수 있는 기술이 필요한 상황이지만, 현재로서는 이러한 기술이 공개되어 있지 않은 실정이다.Since this is a problem not only for the bug bounty platform but also for the participants, a technology that can automatically check for duplication of security reports is needed to solve this problem. However, at present, such technology is not publicly available. The situation is not there.

대한민국 공개특허공보 제10-2023-0125653호, (2023년 8월 29일)Republic of Korea Patent Publication No. 10-2023-0125653, (August 29, 2023)

본 개시에 개시된 실시예는 인공지능 기반의 보안 리포트 평가 장치를 제공하고자 한다.Embodiments disclosed in this disclosure seek to provide an artificial intelligence-based security report evaluation device.

본 개시가 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present disclosure are not limited to the problems mentioned above, and other problems not mentioned can be clearly understood by those skilled in the art from the description below.

상술한 과제를 해결하기 위한 본 개시의 일 실시예에 따른 인공지능 기반의 보안 리포트 평가 장치는, 통신부; 적어도 하나의 인스트럭션이 저장된 메모리; 프로세서를 포함하며, 상기 프로세서는, 상기 적어도 하나의 인스트럭션을 실행하여, 상기 통신부를 통해 보안 리포트를 수신하고, 인공지능 모델을 이용하여 상기 수신된 보안 리포트를 기 설정된 적어도 하나의 검사 방법을 이용하여 검사하고, 상기 검사한 결과를 기반으로 상기 수신된 보안 리포트의 중복 여부를 판단할 수 있다.An artificial intelligence-based security report evaluation device according to an embodiment of the present disclosure to solve the above-described problem includes: a communication unit; a memory storing at least one instruction; and a processor, wherein the processor executes the at least one instruction, receives a security report through the communication unit, and uses an artificial intelligence model to check the received security report using at least one preset inspection method. It is possible to inspect and determine whether the received security report is duplicated based on the inspection result.

또한, 상기 프로세서는, 인공지능 모델을 이용하여 상기 수신된 보안 리포트의 내용을 요약하고, 상기 요약한 내용에 대한 의미적 유사도를 산출하고, 상기 유사도를 산출한 결과를 기반으로 상기 수신된 보안 리포트의 중복 정도를 산출할 수 있다.In addition, the processor summarizes the contents of the received security report using an artificial intelligence model, calculates semantic similarity for the summarized contents, and reports the received security report based on the result of calculating the similarity. The degree of overlap can be calculated.

또한, 상기 프로세서는, 인공지능 모델을 기반으로 STS(Semantic Textual Similarity) 태스크를 수행하여 유사도를 산출하고, 손실함수를 이용하여 상기 산출된 유사도에 따라 추가 가중치를 부여한 후 중복 정도를 산출할 수 있다.In addition, the processor calculates similarity by performing an STS (Semantic Textual Similarity) task based on an artificial intelligence model, applies additional weight according to the calculated similarity using a loss function, and then calculates the degree of overlap. .

또한, 상기 프로세서는, 데이터 증강 기법을 이용하여 상기 수신된 보안 리포트의 데이터 양과 다양성이 증가되도록 할 수 있다.Additionally, the processor may use data augmentation techniques to increase the amount and variety of data in the received security report.

또한, 상기 프로세서는, 제1 언어로 작성된 상기 보안 리포트를 적어도 하나의 다른 언어로 번역하고, 상기 다른 언어로 번역된 상기 보안 리포트를 상기 제1 언어로 번역하여 재번역된 보안 리포트를 생성하고, 상기 인공지능 모델을 이용하여 상기 재번역된 보안 리포트를 기 설정된 적어도 하나의 검사 방법을 이용하여 검사할 수 있다.Additionally, the processor translates the security report written in a first language into at least one other language, translates the security report translated into the other language into the first language, and generates a re-translated security report, Using an artificial intelligence model, the re-translated security report can be inspected using at least one preset inspection method.

또한, 상기 프로세서는, 상기 보안 리포트의 리포트 대상, 리포트 목적 및 리포트 내용을 기반으로 상기 보안 리포트에서 복수의 키워드를 추출하고, 설명 가능한 인공지능 모델을 이용하여 상기 보안 리포트를 설명하는 상기 보안 리포트의 리포트 대상, 리포트 목적 및 리포트 내용을 획득하고, 상기 획득한 리포트 대상, 리포트 목적 및 리포트 내용을 기반으로 복수의 데이터 컬럼(column) 중에서 상기 보안 리포트의 중복 여부 판단에 이용할 적어도 하나의 데이터 컬럼을 선택할 수 있다.In addition, the processor extracts a plurality of keywords from the security report based on the report target, report purpose, and report contents of the security report, and uses an explainable artificial intelligence model to explain the security report. Obtain the report target, report purpose, and report content, and select at least one data column to be used to determine whether the security report is duplicated among a plurality of data columns based on the obtained report target, report purpose, and report content. You can.

또한, 상기 복수의 데이터 컬럼은, 리포트 제목, 보안 취약점의 발견 위치, 범위, Attack point, payload, 공격 유형, 공격 영향, 보안 리포트를 작성한 사용자에 대한 평가, 취약점 설명, 조치 방안, 회사명, 프로그램명 및 제출일시를 포함한다.In addition, the plurality of data columns include report title, security vulnerability discovery location, scope, attack point, payload, attack type, attack impact, evaluation of the user who wrote the security report, vulnerability description, action plan, company name, and program. Includes name and submission date.

또한, 상기 설명 가능한 인공지능 모델을 이용하여 상기 획득한 보안 리포트의 리포트 대상, 리포트 목적 및 리포트 내용을 포함하도록 상기 보안 리포트에 대한 요약하는 설명을 요청하고, 상기 설명 가능한 인공지능 모델로부터 획득한 요약된 설명을 상기 기 설정된 적어도 하나의 검사 방법을 이용하여 검사할 수 있다.In addition, a summary description of the security report is requested to include the report target, report purpose, and report content of the obtained security report using the explainable artificial intelligence model, and the summary obtained from the explainable artificial intelligence model is requested. The provided description can be inspected using at least one preset inspection method.

또한, 상술한 과제를 해결하기 위한 본 개시의 일 실시예에 따른 인공지능 기반의 보안 리포트 평가 방법은, 장치에 의해 수행되는 방법으로, 통신부를 통해 보안 리포트를 수신하는 단계; 인공지능 모델을 이용하여 상기 수신된 보안 리포트를 기 설정된 적어도 하나의 검사 방법을 이용하여 검사하는 단계; 및 상기 검사한 결과를 기반으로 상기 수신된 보안 리포트의 중복 여부를 판단하는 단계를 포함한다.In addition, an artificial intelligence-based security report evaluation method according to an embodiment of the present disclosure to solve the above-described problem is performed by a device, and includes the steps of receiving a security report through a communication unit; inspecting the received security report using at least one preset inspection method using an artificial intelligence model; and determining whether the received security report is a duplicate based on the inspection result.

이 외에도, 본 개시를 구현하기 위한 실행하기 위한 컴퓨터 판독 가능한 기록 매체에 저장된 컴퓨터 프로그램이 더 제공될 수 있다.In addition to this, a computer program stored in a computer-readable recording medium for execution to implement the present disclosure may be further provided.

이 외에도, 본 개시를 구현하기 위한 방법을 실행하기 위한 컴퓨터 프로그램을 기록하는 컴퓨터 판독 가능한 기록 매체가 더 제공될 수 있다.In addition to this, a computer-readable recording medium recording a computer program for executing a method for implementing the present disclosure may be further provided.

본 개시의 전술한 과제 해결 수단에 의하면, 인공지능 기반으로 보안 리포트는 자동 평가할 수 있는 효과를 제공한다.According to the means for solving the above-described problem of the present disclosure, security reports can be automatically evaluated based on artificial intelligence.

본 개시의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned may be clearly understood by those skilled in the art from the description below.

도 1은 본 개시의 실시예에 따른 인공지능 기반의 보안 리포트 평가 시스템의 개략도이다.
도 2는 본 개시의 실시예에 따른 인공지능 기반의 보안 리포트 평가 장치의 블록도이다.
도 3은 본 개시의 실시예에 따른 인공지능 기반의 보안 리포트 평가 방법의 흐름도이다.
도 4 및 도 5는 본 개시의 실시예를 설명하기 위한 예시 도면이다.
도 6은 보안 리포트를 다른 언어로 번역한 후 원래의 언어로 다시 번역하고, 재번역된 보안 리포트를 이용하여 중복 여부를 검사하는 것을 예시한 도면이다.
도 7은 설명 가능한 인공지능 모델을 이용하여 보안 리포트의 정보를 획득하고, 이를 통해 보안 리포트 분석에 이용할 데이터 칼럼을 결정하는 것을 예시한 도면이다.
도 8은 복수의 데이터 컬럼에 포함된 각 항목을 예시한 도면이다.
도 9는 설명 가능한 인공지능 모델을 이용하여 보안 리포트를 요약하고, 이를 이용하여 중복 여부를 검사하는 것을 예시한 도면이다.1 is a schematic diagram of an artificial intelligence-based security report evaluation system according to an embodiment of the present disclosure.
Figure 2 is a block diagram of an artificial intelligence-based security report evaluation device according to an embodiment of the present disclosure.
Figure 3 is a flowchart of an artificial intelligence-based security report evaluation method according to an embodiment of the present disclosure.
Figures 4 and 5 are example diagrams for explaining an embodiment of the present disclosure.
Figure 6 is a diagram illustrating translating a security report into another language, then translating it back to the original language, and checking for duplication using the re-translated security report.
Figure 7 is a diagram illustrating obtaining information in a security report using an explainable artificial intelligence model and determining a data column to be used in security report analysis through this.
Figure 8 is a diagram illustrating each item included in a plurality of data columns.
Figure 9 is a diagram illustrating summarizing a security report using an explainable artificial intelligence model and checking for duplicates using this.

본 개시 전체에 걸쳐 동일 참조 부호는 동일 구성요소를 지칭한다. 본 개시가 실시예들의 모든 요소들을 설명하는 것은 아니며, 본 개시가 속하는 기술분야에서 일반적인 내용 또는 실시예들 간에 중복되는 내용은 생략한다. 명세서에서 사용되는 ‘부, 모듈, 부재, 블록’이라는 용어는 소프트웨어 또는 하드웨어로 구현될 수 있으며, 실시예들에 따라 복수의 '부, 모듈, 부재, 블록'이 하나의 구성요소로 구현되거나, 하나의 '부, 모듈, 부재, 블록'이 복수의 구성요소들을 포함하는 것도 가능하다.Like reference numerals refer to like elements throughout this disclosure. This disclosure does not describe all elements of the embodiments, and general content or overlapping content between embodiments in the technical field to which this disclosure pertains is omitted. The term 'part, module, member, block' used in the specification may be implemented as software or hardware, and depending on the embodiment, a plurality of 'part, module, member, block' may be implemented as a single component, or It is also possible for one 'part, module, member, or block' to include multiple components.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 직접적으로 연결되어 있는 경우뿐 아니라, 간접적으로 연결되어 있는 경우를 포함하고, 간접적인 연결은 무선 통신망을 통해 연결되는 것을 포함한다.Throughout the specification, when a part is said to be “connected” to another part, this includes not only direct connection but also indirect connection, and indirect connection includes connection through a wireless communication network. do.

또한, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Additionally, when a part is said to “include” a certain component, this means that it may further include other components, rather than excluding other components, unless specifically stated to the contrary.

명세서 전체에서, 어떤 부재가 다른 부재 "상에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout the specification, when a member is said to be located “on” another member, this includes not only cases where a member is in contact with another member, but also cases where another member exists between the two members.

제1, 제2 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위해 사용되는 것으로, 구성요소가 전술된 용어들에 의해 제한되는 것은 아니다.Terms such as first and second are used to distinguish one component from another component, and the components are not limited by the above-mentioned terms.

단수의 표현은 문맥상 명백하게 예외가 있지 않는 한, 복수의 표현을 포함한다.Singular expressions include plural expressions unless the context clearly makes an exception.

각 단계들에 있어 식별부호는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 실시될 수 있다.The identification code for each step is used for convenience of explanation. The identification code does not explain the order of each step, and each step may be performed differently from the specified order unless a specific order is clearly stated in the context. there is.

이하 첨부된 도면들을 참고하여 본 개시의 작용 원리 및 실시예들에 대해 설명한다.Hereinafter, the operating principle and embodiments of the present disclosure will be described with reference to the attached drawings.

본 명세서에서 '본 개시에 따른 보안 리포트 평가 장치(100)'는 연산처리를 수행하여 사용자에게 결과를 제공할 수 있는 다양한 장치(100)들이 모두 포함된다. 예를 들어, 본 개시에 따른 보안 리포트 평가 장치(100)는, 컴퓨터, 서버 장치 및 휴대용 단말기를 모두 포함하거나, 또는 어느 하나의 형태가 될 수 있다.In this specification, the 'security report evaluation device 100 according to the present disclosure' includes all various devices 100 that can perform computational processing and provide results to the user. For example, the security report evaluation device 100 according to the present disclosure may include all of a computer, a server device, and a portable terminal, or may take the form of any one.

본 개시의 실시예에서, 사용자는 보안 취약점에 대하여 제보하는 평가자를 의미한다.In an embodiment of the present disclosure, a user refers to an evaluator who reports a security vulnerability.

여기에서, 상기 컴퓨터는 예를 들어, 웹 브라우저가 탑재된 노트북, 데스크톱, 랩톱, 태블릿 PC, 슬레이트 PC 등을 포함할 수 있다.Here, the computer may include, for example, a laptop, desktop, laptop, tablet PC, slate PC, etc. equipped with a web browser.

상기 서버 장치는 외부 장치와 통신을 수행하여 정보를 처리하는 서버로써, 애플리케이션 서버, 컴퓨팅 서버, 데이터베이스 서버, 파일 서버, 게임 서버, 메일 서버, 프록시 서버 및 웹 서버 등을 포함할 수 있다.The server device is a server that processes information by communicating with external devices and may include an application server, computing server, database server, file server, game server, mail server, proxy server, and web server.

상기 휴대용 단말기는 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, PCS, GSM, PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), WiBro(Wireless Broadband Internet) 단말, 스마트폰(Smart Phone) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치와 시계, 반지, 팔찌, 발찌, 목걸이, 안경, 콘택트 렌즈, 또는 머리 착용형 장치(head-mounted-device(HMD) 등과 같은 웨어러블 장치를 포함할 수 있다.The portable terminal is, for example, a wireless communication device that guarantees portability and mobility, such as PCS, GSM, PDC (Personal Digital Cellular), PHS (Personal Handyphone System), PDA (Personal Digital Assistant), IMT (International Mobile Telecommunication) -2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), WiBro (Wireless Broadband Internet) terminal, all types of handhelds such as smart phones, etc. It may include wireless communication devices and wearable devices such as watches, rings, bracelets, anklets, necklaces, glasses, contact lenses, or head-mounted-device (HMD).

본 개시에 따른 인공지능과 관련된 기능은 프로세서(110)와 저장부를 통해 동작된다. 프로세서(110)는 하나 또는 복수의 프로세서(110)로 구성될 수 있다. 이때, 하나 또는 복수의 프로세서(110)는 CPU, AP, DSP(Digital Signal Processor) 등과 같은 범용 프로세서(110), GPU, VPU(Vision Processing Unit)와 같은 그래픽 전용 프로세서(110) 또는 NPU와 같은 인공지능 전용 프로세서(110)일 수 있다. 하나 또는 복수의 프로세서(110)는, 저장부에 저장된 기 정의된 동작 규칙 또는 인공지능 모델에 따라, 입력 데이터를 처리하도록 제어한다. 또는, 하나 또는 복수의 프로세서(110)가 인공지능 전용 프로세서(110)인 경우, 인공지능 전용 프로세서(110)는, 특정 인공지능 모델의 처리에 특화된 하드웨어 구조로 설계될 수 있다.Functions related to artificial intelligence according to the present disclosure are operated through the processor 110 and the storage unit. The processor 110 may be comprised of one or multiple processors 110 . At this time, one or more processors 110 may be a general-purpose processor 110 such as a CPU, AP, or DSP (Digital Signal Processor), a graphics-specific processor 110 such as a GPU, a VPU (Vision Processing Unit), or an artificial intelligence processor such as an NPU. It may be a processor 110 dedicated to intelligence. One or more processors 110 control input data to be processed according to predefined operation rules or artificial intelligence models stored in the storage unit. Alternatively, when one or more processors 110 are dedicated artificial intelligence processors 110, the artificial intelligence dedicated processors 110 may be designed with a hardware structure specialized for processing a specific artificial intelligence model.

기 정의된 동작 규칙 또는 인공지능 모델은 학습을 통해 만들어진 것을 특징으로 한다. 여기서, 학습을 통해 만들어진다는 것은, 기본 인공지능 모델이 학습 알고리즘에 의하여 다수의 학습 데이터들을 이용하여 학습됨으로써, 원하는 특성(또는, 목적)을 수행하도록 설정된 기 정의된 동작 규칙 또는 인공지능 모델이 만들어짐을 의미한다. 이러한 학습은 본 개시에 따른 인공지능이 수행되는 기기 자체에서 이루어질 수도 있고, 별도의 서버 및/ 또는 시스템(10)을 통해 이루어 질 수도 있다. 학습 알고리즘의 예로는, 지도형 학습, 비지도 형 학습, 준지도형 학습 또는 강화 학습이 있으나, 전술한 예에 한정되지 않는다.Predefined operation rules or artificial intelligence models are characterized by being created through learning. Here, being created through learning means that the basic artificial intelligence model is learned using a large number of learning data by a learning algorithm, thereby creating a predefined operation rule or artificial intelligence model set to perform the desired characteristics (or purpose). It means burden. This learning may be performed in the device itself that performs the artificial intelligence according to the present disclosure, or may be performed through a separate server and/or system 10. Examples of learning algorithms include supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but are not limited to the examples described above.

인공지능 모델은, 복수의 신경망 레이어들로 구성될 수 있다. 복수의 신경망 레이어들 각각은 복수의 가중치들을 갖고 있으며, 이전 레이어의 연산 결과와 복수의 가중치들 간의 연산을 통해 신경 망 연산을 수행한다. 복수의 신경망 레이어들이 갖고 있는 복수의 가중치들은 인공지능 모델의 학습 결과에 의해 최적화될 수 있다. 예를 들어, 학습 과정 동안 인공지능 모델에서 획득한 로스 값 또는 코스트 값이 감소 또는 최소화되도록 복수의 가중치들이 갱신될 수 있다. 인공 신경망은 심층 신경망(DNN: Deep Neural Network)를 포함할 수 있으며, 예를 들어, CNN (Convolutional Neural Network), DNN (Deep Neural Network), RNN (Recurrent Neural Network), RBM (Restricted Boltzmann Machine), DBN (Deep Belief Network), BRDNN(Bidirectional Recurrent Deep Neural Network) 또는 심층 Q-네트워크 등이 있으나, 전술한 예에 한정되지 않는다.An artificial intelligence model may be composed of multiple neural network layers. Each of the plurality of neural network layers has a plurality of weights, and neural network calculation is performed through calculation between the calculation result of the previous layer and the plurality of weights. Multiple weights of multiple neural network layers can be optimized by the learning results of the artificial intelligence model. For example, during the learning process, a plurality of weights may be updated so that the loss or cost value obtained from the artificial intelligence model is reduced or minimized. Artificial neural networks may include deep neural networks (DNN), such as Convolutional Neural Network (CNN), Deep Neural Network (DNN), Recurrent Neural Network (RNN), Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Bidirectional Recurrent Deep Neural Network (BRDNN), or deep Q-network, etc., but are not limited to the examples described above.

본 개시의 예시적인 실시예에 따르면, 프로세서(110)는 인공지능을 구현할 수 있다. 인공지능이란 사람의 신경세포(biological neuron)를 모사하여 기계가 학습하도록 하는 인공신경망 기반의 기계 학습법을 의미한다. 인공지능의 방법론에는 학습 방식에 따라 훈련데이터로서 입력데이터와 출력데이터가 같이 제공됨으로써 문제(입력 데이터)의 해답(출력 데이터)이 정해져 있는 지도학습(supervised learning), 및 출력데이터 없이 입력데이터만 제공되어 문제(입력 데이터)의 해답(출력 데이터)이 정해지지 않는 비지도학습(unsupervised learning), 및 현재의 상태에서 어떤 행동을 취할 때마다 외부 환경에서 보상이 주어지는데, 이러한 보상을 최대화하는 방향으로 학습을 진행하는 강화학습(reinforcement learning)으로 구분될 수 있다. 또한, 인공지능의 방법론은 학습 모델의 구조인 아키텍처에 따라 구분될 수도 있는데, 널리 이용되는 딥러닝 기술의 아키텍처는, 합성곱신경망, 순환신경망, 트랜스포머, 생성적 대립 신경망 등으로 구분될 수 있다.According to an exemplary embodiment of the present disclosure, the processor 110 may implement artificial intelligence. Artificial intelligence refers to a machine learning method based on artificial neural networks that allows machines to learn by imitating human biological neurons. Methodology of artificial intelligence includes supervised learning, in which the answer (output data) to the problem (input data) is determined by providing input data and output data together as training data according to the learning method, and only input data is provided without output data. Unsupervised learning, in which the solution (output data) to the problem (input data) is not determined, and rewards are given from the external environment whenever an action is taken in the current state, learning is directed to maximizing these rewards. It can be divided into reinforcement learning. In addition, artificial intelligence methodologies can be divided according to the architecture, which is the structure of the learning model. The architecture of widely used deep learning technology can be divided into convolutional neural networks, recurrent neural networks, transformers, and generative adversarial neural networks.

본 장치(100)는 인공지능 모델을 포함할 수 있다. 인공지능 모델은 하나의 인공지능 모델일 수 있고, 복수의 인공지능 모델로 구현될 수도 있다. 인공지능 모델은 뉴럴 네트워크(또는 인공 신경망)로 구성될 수 있으며, 기계학습과 인지과학에서 생물학의 신경을 모방한 통계학적 학습 알고리즘을 포함할 수 있다. 뉴럴 네트워크는 시냅스의 결합으로 네트워크를 형성한 인공 뉴런(노드)이 학습을 통해 시냅스의 결합 세기를 변화시켜, 문제 해결 능력을 가지는 모델 전반을 의미할 수 있다. 뉴럴 네트워크의 뉴런은 가중치 또는 바이어스의 조합을 포함할 수 있다. 뉴럴 네트워크는 하나 이상의 뉴런 또는 노드로 구성된 하나 이상의 레이어를 포함할 수 있다. 예시적으로, 장치(100)는 input layer, hidden layer, output layer를 포함할 수 있다. 장치(100)를 구성하는 뉴럴 네트워크는 뉴런의 가중치를 학습을 통해 변화시킴으로써 임의의 입력으로부터 예측하고자 하는 결과를 추론할 수 있다.The device 100 may include an artificial intelligence model. An artificial intelligence model may be a single artificial intelligence model or may be implemented as multiple artificial intelligence models. Artificial intelligence models may be composed of neural networks (or artificial neural networks) and may include statistical learning algorithms that mimic biological neurons in machine learning and cognitive science. A neural network can refer to an overall model in which artificial neurons (nodes), which form a network through the combination of synapses, change the strength of the synapse connection through learning and have problem-solving capabilities. Neurons in a neural network can contain combinations of weights or biases. A neural network may include one or more layers consisting of one or more neurons or nodes. By way of example, the device 100 may include an input layer, a hidden layer, and an output layer. The neural network constituting the device 100 can infer the result to be predicted from arbitrary input by changing the weight of the neuron through learning.

프로세서(110)는 뉴럴 네트워크를 생성하거나, 뉴럴 네트워크를 훈련(train, 또는 학습(learn)하거나, 수신되는 입력 데이터를 기초로 연산을 수행하고, 수행 결과를 기초로 정보 신호를 생성하거나, 뉴럴 네트워크를 재훈련할 수 있다. 뉴럴 네트워크의 모델들은 GoogleNet, AlexNet, VGG Network 등과 같은 CNN, R-CNN, RPN, RNN, S-DNN, S-SDNN, Deconvolution Network, DBN, RBM, Fully Convolutional Network, LSTM Network, Classification Network 등 다양한 종류의 모델들을 포함할 수 있으나 이에 제한되지는 않는다. 프로세서(110)는 뉴럴 네트워크의 모델들에 따른 연산을 수행하기 위한 하나 이상의 프로세서(110)를 포함할 수 있다. 예를 들어 뉴럴 네트워크는 심층 뉴럴 네트워크를 포함할 수 있다.The processor 110 generates a neural network, trains or learns a neural network, performs an operation based on received input data, generates an information signal based on the performance result, or generates a neural network. Neural network models can be retrained such as CNN, R-CNN, RPN, RNN, S-DNN, S-SDNN, Deconvolution Network, DBN, RBM, Fully Convolutional Network, LSTM, etc. The processor 110 may include various types of models such as Network and Classification Network, but is not limited to this. The processor 110 may include one or more processors 110 to perform operations according to neural network models. For example, a neural network may include a deep neural network.

뉴럴 네트워크는 CNN, RNN, 퍼셉트, 다층 퍼셉트론, FF(Feed Forward), RBF(Radial Basis Network), DFF(Deep Feed Forward), LSTM(Long Short Term Memory), GRU(Gated Recurrent Unit), AE(Auto Encoder), VAE(Variational Auto Encoder), DAE(Denoising Auto Encoder), SAE(Sparse Auto Encoder), MC(Markov Chain), HN(Hopfield Network), BM(Boltzmann Machine), RBM(Restricted Boltzmann Machine), DBN(Depp Belief Network), DCN(Deep Convolutional Network), DN(Deconvolutional Network), DCIGN(Deep Convolutional Inverse Graphics Network), GAN(Generative Adversarial Network), LSM(Liquid State Machine), ELM(Extreme Learning Machine), ESN(Echo State Network), DRN(Deep Residual Network), DNC(Differentiable Neural Computer), NTM(Neural Turning Machine), CN(Capsule Network), KN(Kohonen Network) 및 AN(Attention Network)를 포함할 수 있으나 이에 한정되는 것이 아닌 임의의 뉴럴 네트워크를 포함할 수 있음은 통상의 기술자가 이해할 것이다.Neural networks include CNN, RNN, perceptron, multi-layer perceptron, FF (Feed Forward), RBF (Radial Basis Network), DFF (Deep Feed Forward), LSTM (Long Short Term Memory), GRU (Gated Recurrent Unit), AE ( Auto Encoder), VAE (Variational Auto Encoder), DAE (Denoising Auto Encoder), SAE (Sparse Auto Encoder), MC (Markov Chain), HN (Hopfield Network), BM (Boltzmann Machine), RBM (Restricted Boltzmann Machine), Depp Belief Network (DBN), Deep Convolutional Network (DCN), Deconvolutional Network (DN), Deep Convolutional Inverse Graphics Network (DCIGN), Generative Adversarial Network (GAN), Liquid State Machine (LSM), Extreme Learning Machine (ELM), It may include Echo State Network (ESN), Deep Residual Network (DRN), Differential Neural Computer (DNC), Neural Turning Machine (NTM), Capsule Network (CN), Kohonen Network (KN), and Attention Network (AN). Those skilled in the art will understand that it is not limited to this and may include any neural network.

본 개시의 예시적인 실시예에 따르면, 프로세서(110)는 GoogleNet, AlexNet, VGG Network 등과 같은 CNN(Convolution Neural Network), R-CNN(Region with Convolution Neural Network), RPN(Region Proposal Network), RNN(Recurrent Neural Network), S-DNN(Stacking-based deep Neural Network), S-SDNN(State-Space Dynamic Neural Network), Deconvolution Network, DBN(Deep Belief Network), RBM(Restrcted Boltzman Machine), Fully Convolutional Network, LSTM(Long Short-Term Memory) Network, Classification Network, Generative Modeling, eXplainable AI, Continual AI, Representation Learning, AI for Material Design, 자연어 처리를 위한 BERT, SP-BERT, MRC/QA, Text Analysis, Dialog System, GPT-3, GPT-4, 비전 처리를 위한 Visual Analytics, Visual Understanding, Video Synthesis, ResNet 데이터 지능을 위한 Anomaly Detection, Prediction, Time-Series Forecasting, Optimization, Recommendation, Data Creation 등 다양한 인공지능 구조 및 알고리즘을 이용할 수 있으며, 이에 제한되지 않는다. 이하, 첨부된 도면을 참조하여 본 개시의 실시예를 상세하게 설명한다.According to an exemplary embodiment of the present disclosure, the processor 110 may be configured to operate a Convolution Neural Network (CNN), Region with Convolution Neural Network (R-CNN), Region Proposal Network (RPN), or RNN, such as GoogleNet, AlexNet, VGG Network, etc. Recurrent Neural Network), S-DNN (Stacking-based deep Neural Network), S-SDNN (State-Space Dynamic Neural Network), Deconvolution Network, DBN (Deep Belief Network), RBM (Restrcted Boltzman Machine), Fully Convolutional Network, LSTM (Long Short-Term Memory) Network, Classification Network, Generative Modeling, eXplainable AI, Continual AI, Representation Learning, AI for Material Design, BERT for natural language processing, SP-BERT, MRC/QA, Text Analysis, Dialog System, Various artificial intelligence structures and algorithms such as GPT-3, GPT-4, Visual Analytics for vision processing, Visual Understanding, Video Synthesis, Anomaly Detection, Prediction, Time-Series Forecasting, Optimization, Recommendation, and Data Creation for ResNet data intelligence. It can be used, but is not limited to this. Hereinafter, embodiments of the present disclosure will be described in detail with reference to the attached drawings.

도 1은 본 개시의 실시예에 따른 인공지능 기반의 보안 리포트 평가 시스템(10)의 개략도이다.Figure 1 is a schematic diagram of an artificial intelligence-based security report evaluation system 10 according to an embodiment of the present disclosure.

도 1을 참조하면, 본 개시의 실시예에 따른 인공지능 기반의 보안 리포트 평가 시스템(10)은 복수의 평가자 단말(200)과 장치(100)를 포함할 수 있다.Referring to FIG. 1, the artificial intelligence-based security report evaluation system 10 according to an embodiment of the present disclosure may include a plurality of evaluator terminals 200 and a device 100.

다만, 몇몇 실시예에서 시스템(10)은 도 1에 도시된 구성요소보다 더 적은 수의 구성요소나 더 많은 구성요소를 포함할 수도 있다.However, in some embodiments, system 10 may include fewer or more components than those shown in FIG. 1 .

본 개시의 실시예에서, 장치(100)는 버그 바운티 플랫폼을 운영하는 장치(100)로, 버그 바운티 서비스를 제공한다. 즉, 장치(100)는 자체적으로 제공하는 서비스에 대한 버그 바운티 서비스를 운용할 수도 있고, 각종 서비스 플랫폼에 대하여 버그 바운티 서비스를 제공할 수도 있다.In an embodiment of the present disclosure, the device 100 operates a bug bounty platform and provides a bug bounty service. That is, the device 100 may operate a bug bounty service for services it provides itself, or may provide a bug bounty service for various service platforms.

참여자는 각종 서비스 플랫폼에서 발생하는 오류, 보안 취약점, 문제점 등에 대한 리포트를 작성하고 이를 장치(100)에 제공/제보한다.Participants write reports on errors, security vulnerabilities, problems, etc. that occur in various service platforms and provide/report them to the device 100.

그리고, 장치(100)는 평가자 단말(200)로부터 수신된 리포트를 검토하고, 검토 결과에 상응하는 보상을 제공하게 된다.Then, the device 100 reviews the report received from the evaluator terminal 200 and provides compensation corresponding to the review result.

이때, 동일한 보안 취약점에 대하여 복수의 평가자로부터 리포트가 수신될 수 있는데, 장치(100)에서는 이에 대한 검토를 진행하여 해당 보안 취약점에 대하여 최초로 제보한 평가자에게 보상을 제공하게 된다.At this time, reports may be received from multiple evaluators regarding the same security vulnerability, and the device 100 reviews them and provides compensation to the evaluator who first reported the security vulnerability.

종래에는 이 과정에서 모든 리포트를 수작업으로 검토하여 중복 여부를 검사함으로써 많은 인력과 시간이 소모되었고, 평가자의 입장에서도 보상을 수신하는데 많은 시간이 소요된다는 문제점이 있었다.In the past, this process involved manually reviewing all reports to check for duplicates, which consumed a lot of manpower and time, and there was a problem from the evaluator's perspective that it took a lot of time to receive compensation.

본 개시의 실시예에 따른 인공지능 기반의 보안 리포트 평가 장치(100), 방법 및 프로그램은 인공지능 모델을 기반으로 자동으로 보안 리포트를 평가하여 중복 여부를 판단함으로써, 이와 같은 문제점을 해결하고자 한다.The artificial intelligence-based security report evaluation device 100, method, and program according to an embodiment of the present disclosure seek to solve this problem by automatically evaluating security reports based on an artificial intelligence model to determine whether there is overlap.

아래에서는, 다른 도면들을 참조하여 본 개시의 실시예에 따른 인공지능 기반의 보안 리포트 평가 장치(100), 방법 및 프로그램에 관하여 보다 상세하게 설명하도록 한다.Below, the artificial intelligence-based security report evaluation device 100, method, and program according to an embodiment of the present disclosure will be described in more detail with reference to other drawings.

도 2는 본 개시의 실시예에 따른 인공지능 기반의 보안 리포트 평가 장치(100)의 블록도이다.Figure 2 is a block diagram of an artificial intelligence-based security report evaluation device 100 according to an embodiment of the present disclosure.

도 2를 참조하면, 본 개시의 실시예에 따른 인공지능 기반의 보안 리포트 평가 장치(100)는 프로세서(110), 통신부(120) 및 메모리(130)를 포함한다.Referring to FIG. 2, the artificial intelligence-based security report evaluation device 100 according to an embodiment of the present disclosure includes a processor 110, a communication unit 120, and a memory 130.

다만, 몇몇 실시예에서 서버는 도 2에 도시된 구성요소보다 더 적은 수의 구성요소나 더 많은 구성요소를 포함할 수도 있다.However, in some embodiments, the server may include fewer or more components than those shown in FIG. 2.

프로세서(110)는 본 장치(100) 내의 구성요소들의 동작을 제어하기 위한 알고리즘 또는 알고리즘을 재현한 프로그램에 대한 데이터를 저장하는 저장부, 및 저장부에 저장된 데이터를 이용하여 전술한 동작을 수행하는 적어도 하나의 프로세서(110)로 구현될 수 있다. 이때, 저장부와 프로세서(110)는 각각 별개의 칩으로 구현될 수 있다. 또는, 저장부와 프로세서(110)는 단일 칩으로 구현될 수도 있다.The processor 110 includes a storage unit that stores data for an algorithm for controlling the operation of components within the device 100 or a program that reproduces the algorithm, and a storage unit that performs the above-described operations using the data stored in the storage unit. It may be implemented with at least one processor 110. At this time, the storage unit and the processor 110 may each be implemented as separate chips. Alternatively, the storage unit and processor 110 may be implemented as a single chip.

또한, 프로세서(110)는 이하의 도면에서 설명되는 본 개시에 따른 다양한 실시 예들을 본 장치(100) 상에서 구현하기 위하여, 위에서 살펴본 구성요소들을 중 어느 하나 또는 복수를 조합하여 제어할 수 있다.In addition, the processor 110 may control any one or a combination of the components described above in order to implement various embodiments according to the present disclosure described in the drawings below on the device 100.

프로세서(110)는 상기 응용 프로그램과 관련된 동작 외에도, 통상적으로 본 장치(100)의 전반적인 동작을 제어할 수 있다. 프로세서(110)는 위에서 살펴본 구성요소들을 통해 입력 또는 출력되는 신호, 데이터, 정보 등을 처리하거나 저장부에 저장된 응용 프로그램을 구동함으로써, 사용자에게 적절한 정보 또는 기능을 제공 또는 처리할 수 있다.In addition to operations related to the application program, the processor 110 may generally control the overall operation of the device 100. The processor 110 can provide or process appropriate information or functions to the user by processing signals, data, information, etc. input or output through the components discussed above, or by running an application program stored in the storage unit.

또한, 프로세서(110)는 저장부에 저장된 응용 프로그램을 구동하기 위하여, 본 장치(100)의 구성요소들 중 적어도 일부를 제어할 수 있다. 나아가, 프로세서(110)는 상기 응용 프로그램의 구동을 위하여, 본 장치(100)에 포함된 구성요소들 중 적어도 둘 이상을 서로 조합하여 동작 시킬 수 있다.Additionally, the processor 110 may control at least some of the components of the device 100 to run an application program stored in the storage unit. Furthermore, the processor 110 may operate in combination with at least two or more of the components included in the device 100 in order to run the application program.

프로세서(110)는 하나 이상으로 구현될 수 있다. 이하에서, 프로세서(110)는 단수로 표현되더라도 복수로 간주될 수 있다. 프로세서(110)는 보안 리포트 평가 장치(100)의 구성들을 컨트롤할 수 있다. 프로세서(110)는 프로그램 내에 포함된 코드 또는 명령으로 표현된 기능을 수행하기 위해 물리적으로 구조화된 회로를 갖는, 하드웨어에 내장된 데이터 처리 장치(100)를 의미할 수 있다. 이와 같이, 프로세서(110)는 하드웨어에 내장된 데이터 처리 장치(100)의 일 예로써, 마이크로프로세서(microprocessor), 중앙처리장치(central processing unit: CPU), 프로세서(110) 코어(processor core), 멀티프로세서(multiprocessor), ASIC(application-specific integrated circuit), FPGA(field programmable gate array) 등의 처리 장치(100)를 망라할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다. 프로세서(110)는 인공 지능 연산을 수행하기 위한 러닝 프로세서(110)를 별도로 구비하거나, 자체적으로 러닝 프로세서(110)를 구비할 수 있다.Processor 110 may be implemented as one or more. Hereinafter, the processor 110 may be considered plural even if it is expressed in the singular. The processor 110 may control the configurations of the security report evaluation device 100. The processor 110 may refer to a data processing device 100 built into hardware that has a physically structured circuit to perform functions expressed by codes or instructions included in a program. As such, the processor 110 is an example of the data processing device 100 built into hardware, and includes a microprocessor, a central processing unit (CPU), a processor core, It may encompass processing devices 100 such as multiprocessors, application-specific integrated circuits (ASICs), and field programmable gate arrays (FPGAs), but the scope of the present invention is not limited thereto. The processor 110 may be separately provided with a learning processor 110 for performing artificial intelligence calculations, or may be provided with its own learning processor 110.

통신부(120)는 보안 리포트 평가 장치(100)를 하나 이상의 네트워크에 연결하는 하나 이상의 모듈을 포함할 수 있다.The communication unit 120 may include one or more modules that connect the security report evaluation device 100 to one or more networks.

통신부(120)는 외부 장치와 통신을 가능하게 하는 하나 이상의 구성 요소를 포함할 수 있으며, 예를 들어, 방송 수신 모듈, 유선통신 모듈, 무선통신 모듈, 근거리 통신 모듈, 위치정보 모듈 중 적어도 하나를 포함할 수 있다.The communication unit 120 may include one or more components that enable communication with an external device, for example, at least one of a broadcast reception module, a wired communication module, a wireless communication module, a short-range communication module, and a location information module. It can be included.

유선 통신 모듈은, 지역 통신(Local Area Network; LAN) 모듈, 광역 통신(Wide Area Network; WAN) 모듈 또는 부가가치 통신(Value Added Network; VAN) 모듈 등 다양한 유선 통신 모듈뿐만 아니라, USB(Universal Serial Bus), HDMI(High Definition Multimedia Interface), DVI(Digital Visual Interface), RS-232(recommended standard232), 전력선 통신, 또는 POTS(plain old telephone service) 등 다양한 케이블 통신 모듈을 포함할 수 있다.Wired communication modules include various wired communication modules such as Local Area Network (LAN) modules, Wide Area Network (WAN) modules, or Value Added Network (VAN) modules, as well as USB (Universal Serial Bus) modules. ), HDMI (High Definition Multimedia Interface), DVI (Digital Visual Interface), RS-232 (recommended standard 232), power line communication, or POTS (plain old telephone service).

무선 통신 모듈은 와이파이(Wifi) 모듈, 와이브로(Wireless broadband) 모듈 외에도, GSM(global System for Mobile Communication), CDMA(Code Division Multiple Access), WCDMA(Wideband Code Division Multiple Access), UMTS(universal mobile telecommunications system), TDMA(Time Division Multiple Access), LTE(Long Term Evolution), 4G, 5G, 6G 등 다양한 무선 통신 방식을 지원하는 무선 통신 모듈을 포함할 수 있다.In addition to Wi-Fi modules and WiBro (Wireless broadband) modules, wireless communication modules include GSM (global System for Mobile Communication), CDMA (Code Division Multiple Access), WCDMA (Wideband Code Division Multiple Access), and UMTS (universal mobile telecommunications system). ), TDMA (Time Division Multiple Access), LTE (Long Term Evolution), 4G, 5G, 6G, etc. may include a wireless communication module that supports various wireless communication methods.

무선 통신 모듈은 통신 신호를 송신하는 안테나 및 송신기(Transmitter)를 포함하는 무선 통신 인터페이스를 포함할 수 있다. 또한, 무선 통신 모듈은 프로세서(110)의 제어에 따라 무선 통신 인터페이스를 통해 프로세서(110)로부터 출력된 디지털 제어 신호를 아날로그 형태의 무선 신호로 변조하는 신호 변환 모듈을 더 포함할 수 있다.The wireless communication module may include a wireless communication interface including an antenna and a transmitter that transmits communication signals. In addition, the wireless communication module may further include a signal conversion module that modulates a digital control signal output from the processor 110 through a wireless communication interface into an analog wireless signal under the control of the processor 110.

근거리 통신 모듈은 근거리 통신(Short range communication)을 위한 것으로서, 블루투스(Bluetooth??), RFID(Radio Frequency Identification), 적외선 통신(Infrared Data Association; IrDA), UWB(Ultra-Wideband), ZigBee, NFC(Near Field Communication), Wi-Fi(Wireless-Fidelity), Wi-Fi Direct, Wireless USB(Wireless Universal Serial Bus) 기술 중 적어도 하나를 이용하여, 근거리 통신을 지원할 수 있다.The short-range communication module is for short-range communication and includes Bluetooth®, RFID (Radio Frequency Identification), Infrared Data Association (IrDA), UWB (Ultra-Wideband), ZigBee, and NFC ( Near Field Communication), Wi-Fi (Wireless-Fidelity), Wi-Fi Direct, and Wireless USB (Wireless Universal Serial Bus) technology can be used to support short-distance communication.

메모리(130)는 본 장치(100)의 다양한 기능을 지원하는 데이터를 저장할 수 있다. 메모리(130)는 본 장치(100)에서 구동되는 다수의 응용 프로그램(application program 또는 애플리케이션(application)), 본 장치(100)의 동작을 위한 데이터들, 명령어들을 저장할 수 있다. 이러한 응용 프로그램 중 적어도 일부는, 본 장치(100)의 기본적인 기능을 위하여 존재할 수 있다. 한편, 응용 프로그램은, 메모리(130)에 저장되고, 장치(100)에 설치되어, 프로세서(110)에 의하여 동작(또는 기능)을 수행하도록 구동될 수 있다.The memory 130 can store data supporting various functions of the device 100. The memory 130 may store a number of application programs (application programs or applications) running on the device 100, data for operating the device 100, and commands. At least some of these application programs may exist for the basic functions of the device 100. Meanwhile, the application program may be stored in the memory 130, installed in the device 100, and driven to perform an operation (or function) by the processor 110.

메모리(130)는 본 장치(100)의 다양한 기능을 지원하는 데이터와, 프로세서(110)의 동작을 위한 프로그램을 저장할 수 있고, 입/출력되는 데이터들(예를 들어, 음악 파일, 정지영상, 동영상 등)이 저장될 수 있고, 본 장치(100)에서 구동되는 다수의 응용 프로그램(application program 또는 애플리케이션(application)), 본 장치(100)의 동작을 위한 데이터들, 명령어들을 저장할 수 있다. 이러한 응용 프로그램 중 적어도 일부는, 무선 통신을 통해 외부 서버로부터 다운로드 될 수 있다.The memory 130 can store data supporting various functions of the device 100 and a program for the operation of the processor 110, and can store input/output data (e.g., music files, still images, video, etc.) can be stored, a number of application programs (application programs or applications) running on the device 100, data for operation of the device 100, and commands can be stored. At least some of these applications may be downloaded from an external server via wireless communication.

이러한, 메모리(130)는 플래시 메모리(130) 타입(flash memory type), 하드디스크 타입(hard disk type), SSD 타입(Solid State Disk type), SDD 타입(Silicon Disk Drive type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리(130) 등), 램(random access memory; RAM), SRAM(static random access memory), 롬(read-only memory; ROM), EEPROM(electrically erasable programmable read-only memory), PROM(programmable read-only memory), 자기 메모리(130), 자기 디스크 및 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. 또한, 메모리(130)는 본 장치(100)와는 분리되어 있으나, 유선 또는 무선으로 연결된 데이터베이스가 될 수도 있다.The memory 130 is of a flash memory type, hard disk type, solid state disk type, SDD type (Silicon Disk Drive type), and multimedia card micro type. (multimedia card micro type), card type memory (e.g. SD or XD memory 130, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM) ), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory 130, magnetic disk, and optical disk. Additionally, the memory 130 is separate from the device 100, but may be a database connected wired or wirelessly.

메모리(130)는 프로세서(110)와 전기적으로 연결될 수 있고, 프로세서(110)에서 수행되는 적어도 하나의 코드(Code)를 저장할 수 있다. 메모리(130)는 다양한 형태의 저장 장치를 통칭할 수 있다. 메모리(130)는 인공 지능, 머신 러닝, 인공 신경망을 이용하여 연산을 수행하는데 필요한 정보를 저장할 수 있다.The memory 130 may be electrically connected to the processor 110 and may store at least one code executed by the processor 110. Memory 130 may refer to various types of storage devices. The memory 130 can store information necessary to perform calculations using artificial intelligence, machine learning, and artificial neural networks.

메모리(130)는 다양한 학습 모델을 저장할 수 있다. 메모리(130)에 저장된 학습 모델들은 학습 데이터가 아닌 새로운 입력 데이터에 대하여 결과 값을 추론할 수 있고, 추론된 값은 어떠한 동작을 수행하기 위한 판단의 기초로 이용될 수 있다. 메모리(130) 저장된 학습 모델들은 레이블(Label) 정보에 기초하여 학습이 수행될 수 있고, 학습의 정확도를 높이기 위해, 손실 함수가 목표의 값을 갖도록 다양한 역전파(Backpropagation) 알고리즘이 적용될 수 있다.Memory 130 can store various learning models. Learning models stored in the memory 130 can infer result values for new input data other than learning data, and the inferred values can be used as a basis for judgment to perform certain operations. The learning models stored in the memory 130 may be learned based on label information, and in order to increase the accuracy of learning, various backpropagation algorithms may be applied so that the loss function has a target value.

또한, 저장부는 보안 리포트 평가 장치(100)를 위한 복수의 프로세스를 구비할 수 있다.Additionally, the storage unit may be provided with a plurality of processes for the security report evaluation device 100.

입력부는 영상 정보(또는 신호), 오디오 정보(또는 신호), 데이터, 또는 사용자로부터 입력되는 정보의 입력을 위한 것으로서, 적어도 하나의 카메라, 적어도 하나의 마이크로폰 및 사용자 입력부 중 적어도 하나를 포함할 수 있다. 입력부에서 수집한 음성 데이터나 이미지 데이터는 분석되어 사용자의 제어명령으로 처리될 수 있다.The input unit is for inputting image information (or signal), audio information (or signal), data, or information input from a user, and may include at least one of at least one camera, at least one microphone, and a user input unit. . Voice data or image data collected from the input unit can be analyzed and processed as a user's control command.

입력부는 사용자로부터 정보를 입력받기 위한 것으로서, 입력부를 통해 정보가 입력되면, 프로세서(110)는 입력된 정보에 대응되도록 본 장치(100)의 동작을 제어할 수 있다. 이러한, 입력부는 하드웨어식 물리 키(예를 들어, 본 장치(100)의 전면, 후면 및 측면 중 적어도 하나에 위치하는 버튼, 돔 스위치 (dome switch), 조그 휠, 조그 스위치 등) 및 소프트웨어식 터치 키를 포함할 수 있다. 일 예로서, 터치 키는, 소프트웨어적인 처리를 통해 터치스크린 타입의 디스플레이부 상에 표시되는 가상 키(virtual key), 소프트 키(soft key) 또는 비주얼 키(visual key)로 이루어지거나, 상기 터치스크린 이외의 부분에 배치되는 터치 키(touch key)로 이루어질 수 있다. 한편, 상기 가상키 또는 비주얼 키는, 다양한 형태를 가지면서 터치스크린 상에 표시되는 것이 가능하며, 예를 들어, 그래픽(graphic), 텍스트(text), 아이콘(icon), 비디오(video) 또는 이들의 조합으로 이루어질 수 있다.The input unit is for receiving information from the user. When information is input through the input unit, the processor 110 can control the operation of the device 100 to correspond to the input information. These input units include hardware-type physical keys (e.g., buttons, dome switches, jog wheels, jog switches, etc. located on at least one of the front, back, and sides of the device 100) and software-type touch. Can contain keys. As an example, the touch key consists of a virtual key, soft key, or visual key displayed on a touch screen-type display unit through software processing, or is displayed on the touch screen. It may be composed of touch keys placed in other parts. Meanwhile, the virtual key or visual key can be displayed on the touch screen in various forms, for example, graphic, text, icon, video or these. It can be made up of a combination of .

출력부는 시각, 청각 또는 촉각 등과 관련된 출력을 발생시키기 위한 것으로, 디스플레이부, 음향 출력부, 햅틱 모듈(Haptic Module) 및 광 출력부 중 적어도 하나를 포함할 수 있다. 디스플레이부는 터치 센서와 상호 레이어 구조를 이루거나 일체형으로 형성됨으로써, 터치 스크린을 구현할 수 있다. 이러한 터치 스크린은, 본 장치(100)와 사용자 사이의 입력 인터페이스를 제공하는 사용자 입력부로써 기능함과 동시에, 본 장치(100)와 사용자 간에 출력 인터페이스를 제공할 수 있다.The output unit is intended to generate output related to vision, hearing, or tactile sensation, and may include at least one of a display unit, an audio output unit, a haptic module, and an optical output unit. A touch screen can be implemented by forming a layered structure with the touch sensor or being integrated with the display unit. This touch screen functions as a user input unit that provides an input interface between the device 100 and the user, and can simultaneously provide an output interface between the device 100 and the user.

도 3은 본 개시의 실시예에 따른 인공지능 기반의 보안 리포트 평가 방법의 흐름도이다.Figure 3 is a flowchart of an artificial intelligence-based security report evaluation method according to an embodiment of the present disclosure.

도 4 및 도 5는 본 개시의 실시예를 설명하기 위한 예시 도면이다.4 and 5 are exemplary drawings for explaining an embodiment of the present disclosure.

도 3 내지 도 5를 참조하여 본 개시의 실시예에 따른 인공지능 기반의 보안 리포트 평가 방법을 설명하도록 한다.An artificial intelligence-based security report evaluation method according to an embodiment of the present disclosure will be described with reference to FIGS. 3 to 5.

프로세서(110)가 보안 리포트를 수신한다. (S100)Processor 110 receives a security report. (S100)

보다 상세하게는, 프로세서(110)는 통신부(120)를 통해 평가자의 단말(200)로부터 작성된 보안 리포트를 수신한다.More specifically, the processor 110 receives a security report prepared from the evaluator's terminal 200 through the communication unit 120.

일 실시예로, 보안 리포트 평가 장치(100)는 웹 또는 앱을 통해서 서비스를 제공할 수 있으며, 장치(100)에 접속한 단말(200)로 보안 리포트 폼을 제공할 수도 있고 문답 형식 기반으로 보안 리포트 내용을 입력받아 수신할 수도 있다.In one embodiment, the security report evaluation device 100 may provide a service through the web or an app, and may provide a security report form to the terminal 200 connected to the device 100, and may provide a security report form based on a question and answer format. You can also input and receive the report contents.

또한, 프로세서(110)는 수신한 보안 리포트를 메모리(130)나 클라우드 서버에 저장할 수 있다.Additionally, the processor 110 may store the received security report in the memory 130 or a cloud server.

프로세서(110)가 S100에서 수신된 보안 리포트를 검사한다. (S200)Processor 110 inspects the security report received from S100. (S200)

메모리(130)는 보안 리포트 검사를 위한 각종 명령어, 알고리즘이 저장되어 있으며, 보안 리포트를 검사하는 방법이 학습되어 있는 적어도 하나의 인공지능 모델이 저장될 수 있다.The memory 130 stores various commands and algorithms for inspecting security reports, and may store at least one artificial intelligence model that has learned how to inspect security reports.

프로세서(110)는 적어도 하나의 인공지능 모델을 이용하여 S100에서 수신된 보안 리포트를 기 설정된 적어도 하나의 검사 방법을 이용하여 검사한다.The processor 110 uses at least one artificial intelligence model to inspect the security report received from S100 using at least one preset inspection method.

먼저, 아래에서 설명할 알고리즘, 용어에 대해서 설명하도록 한다.First, let's explain the algorithms and terms that will be explained below.

NLP(Natural Language Processing) : 인간의 언어를 컴퓨터가 이해하고 처리할 수 있도록 하는 연구 분야 텍스트 분석 언어 모델링 번역 감정 분석 등 다양한 하위 분야를 포함한다.NLP (Natural Language Processing): A field of research that enables computers to understand and process human language. It includes various subfields such as text analysis, language modeling, translation, and sentiment analysis.

언어 모델(Language Model): 언어 모델은 자연어 처리 (NLP) 분야에서 사용되며 문장이나 단어 시퀀스의 확률을 예측하는 모델 이는 주어진 문맥에 대해 다음 단어나 단어 시퀀스가 얼마나 확률적으로 나타날지를 추정할 수 있다.Language Model: A language model is used in the field of natural language processing (NLP) and is a model that predicts the probability of a sentence or word sequence. It can estimate how likely the next word or word sequence will appear for a given context. .

Feature: 머신러닝 모델에서 입력 데이터의 특정 측면을 나타내는 변수나 속성 이러한 feature 는 모델의 학습 및 예측에 사용된다.Feature: A variable or attribute that represents a specific aspect of input data in a machine learning model. These features are used for learning and prediction of the model.

Classification method: 분류는 주어진 데이터를 미리 정의된 클래스로 나누는 머신러닝의 하위 분야이다.Classification method: Classification is a subfield of machine learning that divides given data into predefined classes.

Semantic Textual Similarity(STS): 두 텍스트 사이의 의미적 유사도를 측정하는 방법 벡터 공간에 텍스트를 매핑하고 그 공간에서 유사도를 계산하여 두 텍스트가 얼마나 의미적으로 가까운지를 판단할 수 있다.Semantic Textual Similarity (STS): A method of measuring semantic similarity between two texts. You can determine how semantically close two texts are by mapping the text to a vector space and calculating the similarity in that space.

의미적 유사도: 텍스트나 단어 문장 간의 의미적 연관성을 측정하는 것을 말하며, 의미적 유사도는 자연어 처리 정보 검색 데이터 분석 등에서 사용될 수 있다.Semantic similarity: This refers to measuring the semantic correlation between text or word sentences. Semantic similarity can be used in natural language processing, information retrieval, data analysis, etc.

Summarization: 긴 텍스트나 문서를 그 핵심 내용을 잃지 않으면서도 더 짧은 형태로 축약하는 것을 의미한다.Summarization: This refers to condensing a long text or document into a shorter form without losing its core content.

Text Augmentation : 텍스트 증강은 기존의 텍스트 데이터를 변형하여 데이터셋을 확장하는 방법 머신러닝 모델의 성능을 향상시키기 위해 사용되며 단어의 순서를 바꾸거나 동의어를 사용하는 등의 방법이 있다.Text Augmentation: Text augmentation is a method of expanding the dataset by modifying existing text data. It is used to improve the performance of machine learning models and includes methods such as changing the order of words or using synonyms.

Back Translation: 텍스트를 한 언어에서 다른 언어로 번역한 다음 다시 원래의 언어로 번역하는 방법으로, 텍스트 증강에서 사용되어 원래 텍스트와 다르지만 의미는 유사한 새로운 텍스트를 생성할 수 있다.Back Translation: A method of translating text from one language to another and then back to the original language. It can be used in text augmentation to create new text that is different from the original text but has similar meaning.

Pre-training: 사전 학습은 일반적인 작업에 대해 머신러닝 모델을 먼저 학습시키는 과정으로, 언어 모델에서 대량의 텍스트 데이터를 사용하여 모델을 사전 학습한 후 특정 작업에 맞게 미세 조정 할 수 있다.Pre-training: Pre-training is the process of first training a machine learning model for a general task. After pre-training the model using a large amount of text data in a language model, it can be fine-tuned for a specific task.

Fine tuning: 미세 조정은 사전 학습된 모델을 특정 작업에 최적화하기 위해 추가 학습을 하는 과정으로, 일반적으로 작은 양의 레이블이 지정된 데이터를 사용하여 모델의 가중치를 업데이트할 수 있다.Fine tuning: Fine tuning is the process of performing additional training on a pre-trained model to optimize it for a specific task. It typically involves updating the model's weights using a small amount of labeled data.

Loss function: 모델의 성능을 측정하는 함수로, 이 함수의 값이 작을수록 모델의 예측이 실제 값에 가까워지므로 학습 과정에서 이 값을 최소화하는 것을 목표로 할 수 있다.Loss function: A function that measures the performance of the model. The smaller the value of this function, the closer the model's prediction is to the actual value, so you can aim to minimize this value during the learning process.

도 4 및 도 5를 참조하면, 프로세서(110)는 Pre-processing, Augmentation, Summarization, Semantic Textual Similarity, Classifier 중 적어도 하나의 방법을 이용하여 보안 리포트를 검사할 수 있다.Referring to Figures 4 and 5, the processor 110 may inspect the security report using at least one method among Pre-processing, Augmentation, Summarization, Semantic Textual Similarity, and Classifier.

프로세서(110)는 Pre-processing을 통해서 보안 리포트 내 데이터 중에서 필요한 컬럼을 선택하고, 데이터를 정규화 및 변환하고, 결측값을 처리하고, 범주형 데이터 변환을 거치고, URL 및 파일 경로를 처리하고, 범주형 데이터를 통일화 하고, 텍스트 데이터를 처리하고 범주형 데이터 key 값으로 변환한다.The processor 110 selects necessary columns from the data in the security report through pre-processing, normalizes and transforms the data, processes missing values, undergoes categorical data conversion, processes URLs and file paths, and processes the data into categories. Unify type data, process text data, and convert into categorical data key values.

프로세서(110)는 Augmentation를 통해서 보안 리포트 내 데이터 증강을 수행할 수 있다.The processor 110 can perform data augmentation in the security report through augmentation.

보다 상세하게는, 프로세서(110)는 Text Augmentation를 수행할 수 있으며, back traslation 방법론, 동의어 대치, 랜덤 단어 삽입, 랜덤 단어 스왑, 랜덤 단어 삭제 기법을 사용하여 텍스트의 증강, 데이터의 30% 대상 방법을 랜덤하게 복수 회 수행하여 데이터 증강을 수행할 수 있다.More specifically, the processor 110 can perform text augmentation, augmenting text using back traslation methodology, synonym substitution, random word insertion, random word swap, and random word deletion techniques, and targeting 30% of the data. Data augmentation can be performed by randomly performing multiple times.

예를 들어, 본 발명의 실시예에서 프로세서(110)는 back translation, random swap, random deletion, synonym replacement, random insertion 중 적어도 하나의 방법을 이용하여 데이터 증강을 수행할 수 있다.For example, in an embodiment of the present invention, the processor 110 may perform data augmentation using at least one method among back translation, random swap, random deletion, synonym replacement, and random insertion.

프로세서(110)는 인공지능 모델을 이용하여 Summarization을 수행할 수 있으며, pre-train 데이터, fine-tuning 데이터를 이용할 수 있다. 상세하게는, 프로세서(110)는 보안 리포트 내 텍스트 데이터를 요약하여 텍스트의 semantic feature를 압축하고, 이후 STS, Classification 등에서 성능이 향상되는 효과를 발휘할 수 있다. 이때, 보안 리포트 내 텍스트 데이터는 상세 설명, 조치 방안 등의 텍스트 데이터가 적용 가능하다.The processor 110 can perform summarization using an artificial intelligence model and can use pre-train data and fine-tuning data. In detail, the processor 110 can summarize the text data in the security report and compress the semantic features of the text, thereby improving performance in STS, Classification, etc. At this time, text data such as detailed explanations and action plans can be applied to the text data in the security report.

프로세서(110)는 기 설정된 길이를 초과하는 텍스트의 경우 기 설정된 조건에 따라 축소함으로써, 포괄적인 데이터 전처리 효과를 발휘할 수 잇다.The processor 110 can exert a comprehensive data preprocessing effect by reducing text that exceeds a preset length according to preset conditions.

프로세서(110)는 인공지능 모델을 이용하여 Semantic Textual Similarity를 수행할 수 있으며, 문장에 최적화된 유사도를 계산하고, 계산된 유사도를 하나의 Feature로 사용하여 Classifier (분류 모델)에 입력할 수 있다.The processor 110 can perform Semantic Textual Similarity using an artificial intelligence model, calculate a similarity optimized for a sentence, and use the calculated similarity as a feature to input it into a Classifier (classification model).

본 발명의 실시예에서, 프로세서(110)는 딥러닝을 기반으로 한 언어 모델(Language Model)을 semantic textual similarity 태스크로 fine-tuning하고, 해당 모델을 사용해 문서 단위의 유사도를 계산할 수 있다.In an embodiment of the present invention, the processor 110 can fine-tune a deep learning-based language model with a semantic textual similarity task and calculate document-level similarity using the model.

프로세서(110)는 인공지능 모델: 분류 모델을 이용하여 Classifier를 수행할 수 있으며, STS Fearture의 계산된 값 만큼 가중치를 부여하는 Custom Loss Function을 사용할 수 있다.The processor 110 can perform a classifier using an artificial intelligence model: a classification model, and can use a Custom Loss Function that gives weight according to the calculated value of STS Fearture.

프로세서(110)가 S200의 검사 결과를 기반으로, 보안 리포트의 중복 여부를 판단한다. (S300)The processor 110 determines whether the security report is duplicated based on the inspection result of S200. (S300)

도 4, 도 5와 같은 기 설정된 검사 방법의 프로세스가 완료되면, 프로세서(110)는 보안 리포트에 대한 중복 정도를 산출/출력할 수 있다.When the process of the preset inspection method as shown in FIGS. 4 and 5 is completed, the processor 110 can calculate/output the degree of duplication of the security report.

일 예로, 프로세서(110)는 인공지능 모델을 이용하여 보안 리포트의 내용을 요약하고, 요약한 내용에 대한 의미적 유사도를 산출하고, 유사도 산출 결과를 기반으로 보안 리포트의 중복 정도를 산출할 수 있다. 이때, 프로세서(110)는 인공지능 모델을 기반으로 STS(Semantic Textual Similarity) 태스크를 수행하여 유사도를 산출하고, 손실함수를 이용하여 상기 산출된 유사도에 따라 추가 가중치를 부여한 후 중복 정도를 산출할 수 있으며, 데이터 증강 기법을 이용하여 상기 수신된 보안 리포트의 데이터 양과 다양성을 증가시킬 수 있다.As an example, the processor 110 may use an artificial intelligence model to summarize the contents of the security report, calculate semantic similarity for the summarized contents, and calculate the degree of duplication of the security report based on the similarity calculation result. . At this time, the processor 110 calculates the similarity by performing an STS (Semantic Textual Similarity) task based on the artificial intelligence model, uses a loss function to assign additional weight according to the calculated similarity, and then calculates the degree of overlap. And, the data amount and diversity of the received security report can be increased using data augmentation techniques.

본 개시의 실시예에 따른 인공지능 기반의 보안 리포트 평가 장치(100)는 STS 기반의 유사도를 산출하기 전에 문서를 요약 (Text Summarization)하는 프로세스를 더 수행할 수 있다.The artificial intelligence-based security report evaluation device 100 according to an embodiment of the present disclosure may further perform a document summarization process before calculating STS-based similarity.

일 예로, 프로세서(110)는 Language Model(언어모델)을 text summarization task에 적합한 데이터로 Fine-tuning함으로써, 보안 리포트에 대한 딥러닝 기반의 요약을 진행할 수 있다.As an example, the processor 110 can perform a deep learning-based summary of a security report by fine-tuning the Language Model with data suitable for the text summarization task.

이러한 요약을 통해 아래와 같은 효과를 발휘할 수 있다.This summary can achieve the following effects:

① 원본 문서 대신 요약된 문서에 대한 유사도를 산출함으로써, 유사도 검색 계산 대상의 크기가 줄어들어 유사도 검사에 소요되는 시간이 단축되는 효과가 있다.① By calculating the similarity for the summarized document instead of the original document, the size of the target for similarity search calculation is reduced, which has the effect of shortening the time required for similarity testing.

② 요약 과정에서 보안 리포트에 중요한 내용이 강조되고 불필요한 내용이 제외되어 유사도 계산 시 보다 정확한 계산이 가능해지는 효과가 있다.② During the summarization process, important content in the security report is emphasized and unnecessary content is excluded, which has the effect of enabling more accurate calculations when calculating similarity.

③ 요약 과정에서 불필요하거나 중복되는 정보가 제거되어 유사도 계산 시 노이즈가 감소하여 문서 간에 유사도를 보다 정확하게 산출하는 효과가 있다.③ Unnecessary or redundant information is removed during the summarization process, which reduces noise when calculating similarity, which has the effect of calculating similarity between documents more accurately.

④ 요약된 문서는 일반적으로 원본 문서에 비해 크기가 작아 메모리 사용을 줄일 수 있는 효과가 있다.④ Summarized documents are generally smaller than the original documents, which has the effect of reducing memory usage.

프로세서(110)는 보안 리포트의 데이터 양을 증강시킨 후, 보안 리포트의 증강된 보안 리포트의 일관성을 산출하고, 기 설정된 일관성을 만족하는 경우에만 증강된 보안 리포트를 이용하여 중복 정도를 검사하도록 할 수 있다.The processor 110 can augment the amount of data in the security report, calculate the consistency of the augmented security report, and check the degree of duplication using the augmented security report only when the preset consistency is satisfied. there is.

일 실시예로, 프로세서(110)는 인공지능 모델을 이용하여 획득한 보안 리포트와 증강된 보안 리포트의 요약서, 리포트 대상, 리포트 목적 및 리포트 내용을 기반으로 보안 리포트의 증강된 보안 리포트의 일관성을 측정할 수 있다. 이때, 프로세서(110)는 보안 리포트와 증강된 보안 리포트의 대상, 목적이 일치하지 않는 경우에는 데이터 증강에 실패한 것으로 판단할 수 있다.In one embodiment, the processor 110 measures the consistency of the augmented security report based on the summary of the security report and the augmented security report obtained using the artificial intelligence model, the report target, report purpose, and report contents. can do. At this time, the processor 110 may determine that data augmentation has failed if the object and purpose of the security report and the augmented security report do not match.

도 6은 보안 리포트를 다른 언어로 번역한 후 원래의 언어로 다시 번역하고, 재번역된 보안 리포트를 이용하여 중복 여부를 검사하는 것을 예시한 도면이다.Figure 6 is a diagram illustrating translating a security report into another language, then translating it back to the original language, and checking for duplication using the re-translated security report.

도 6을 참조하면, 프로세서(110)는 제1 언어로 작성된 보안 리포트를 적어도 하나의 다른 언어로 번역한다. 그리고, 프로세서(110)는 다른 언어로 번역된 보안 리포트를 다시 제1 언어로 번역하고, 제1 언어로 재번역된 보안 리포트를 인공지능 모델에 입력하여 중복 여부를 판단할 수 있다.Referring to FIG. 6, the processor 110 translates a security report written in the first language into at least one other language. Additionally, the processor 110 may translate the security report translated into another language back into the first language and input the security report re-translated into the first language into an artificial intelligence model to determine whether there is a duplicate.

이때, 프로세서(110)는 인공지능 모델을 이용하여 재번역된 보안 리포트를 기 설정된 적어도 하나의 검사 방법을 이용하여 검사하고, 검사한 결과를 기반으로 보안 리포트의 중복 여부를 판단할 수 있다.At this time, the processor 110 may inspect the security report re-translated using the artificial intelligence model using at least one preset inspection method and determine whether the security report is duplicated based on the inspection result.

이를 위해, 보안 리포트 평가 장치(100)는 번역 모듈 또는 번역 모델을 더 포함할 수 있다.To this end, the security report evaluation device 100 may further include a translation module or translation model.

일 실시예로, 프로세서(110)는 제1 언어로 작성된 보안 리포트를 복수의 서로 다른 언어로 번역한다. 그리고, 프로세서(110)는 번역된 각 보안 리포트를 다시 제1 언어로 번역하여 복수의 재번역된 보안 리포트를 생성한다.In one embodiment, the processor 110 translates a security report written in the first language into a plurality of different languages. Then, the processor 110 translates each translated security report back into the first language to generate a plurality of re-translated security reports.

프로세서(110)는 설명 가능한 인공지능 모델을 이용하여 보안 리포트를 설명 가능하도록 제1 요약서를 생성한다.The processor 110 generates a first summary so that the security report can be explained using an explainable artificial intelligence model.

프로세서(110)는 설명 가능한 인공지능 모델을 이용하여 복수의 재번역된 보안 리포트 각각을 설명 가능하도록 복수의 제2 요약서를 생성한다.The processor 110 generates a plurality of second summaries so that each of the plurality of re-translated security reports can be explained using an explainable artificial intelligence model.

요약서는 보안 리포트의 리포트 대상, 리포트 목적 및 리포트 내용을 포함할 수 있다.The summary may include the report subject, report purpose, and report content of the security report.

프로세서(110)는 제1 요약서와 복수의 제2 요약서 각각에 대한 일치도를 측정하고, 일치도가 가장 높은 제2 요약서에 해당하는 재번역된 보안 리포트를 검사 대상 문서로 선택한다.The processor 110 measures the degree of consistency between the first summary and each of the plurality of second summaries, and selects the retranslated security report corresponding to the second summary with the highest degree of agreement as the document to be inspected.

이를 통해서, 보안 리포트 평가 장치(100)는 재번역에 의해 보안 리포트의 내용이 달라지는 것을 방지할 수 있다.Through this, the security report evaluation device 100 can prevent the contents of the security report from changing due to re-translation.

또한, 프로세서(110)는 보안 리포트의 리포트 대상, 리포트 목적 및 리포트 내용과 제1 요약서와 제2 요약서에 대하여 측정한 일치도를 저장, 인공지능 모델에 학습시킨다.In addition, the processor 110 stores the degree of agreement measured between the report object, report purpose, and report contents of the security report and the first summary and the second summary and trains the artificial intelligence model.

그리고, 프로세서(110)는 새로운 보안 리포트가 수신되면, 설명 가능한 인공지능 모델을 이용하여 보안 리포트의 요약서(리포트 대상, 목적 및 내용)를 생성하고, 이를 기반으로 재번역된 보안 리포트를 획득하기 위해서 보안 리포트를 번역할 언어를 선택할 수 있다.And, when a new security report is received, the processor 110 generates a summary of the security report (report object, purpose, and contents) using an explainable artificial intelligence model, and obtains a re-translated security report based on this. You can select the language to translate the report into.

즉, 보안 리포트의 리포트 대상, 목적 및 내용에 따라 부합하는 검사를 위해 최적화된 번역 언어가 있으며, 이를 이용하여 효율을 높이는 것을 의미한다.In other words, there is an optimized translation language for inspection that matches the report target, purpose, and content of the security report, and this means improving efficiency.

도 7은 설명 가능한 인공지능 모델을 이용하여 보안 리포트의 정보를 획득하고, 이를 통해 보안 리포트 분석에 이용할 데이터 칼럼을 결정하는 것을 예시한 도면이다.Figure 7 is a diagram illustrating obtaining information in a security report using an explainable artificial intelligence model and determining a data column to be used in security report analysis through this.

도 7을 참조하면, 프로세서(110)는 보안 리포트의 리포트 대상, 리포트 목적 및 리포트 내용을 기반으로 보안 리포트에서 복수의 키워드를 추출한다.Referring to FIG. 7, the processor 110 extracts a plurality of keywords from the security report based on the report target, report purpose, and report contents of the security report.

복수의 키워드는 보안 리포트의 리포트 대상, 리포트 목적 및 리포트의 내용을 대표/표현할 수 있는 키워드가 선택될 수 있다.A plurality of keywords may be selected that can represent/express the report target, purpose of the report, and contents of the security report.

프로세서(110)는 설명 가능한 인공지능 모델을 이용하여 보안 리포트를 설명하는 보안 리포트의 리포트 대상, 리포트 목적 및 리포트 내용을 획득할 수 있다.The processor 110 can obtain the report target, report purpose, and report contents of the security report that explains the security report using an explainable artificial intelligence model.

그리고, 프로세서(110)는 획득한 리포트 대상, 리포트 목적 및 리포트 내용을 기반으로 복수의 데이터 컬럼(column) 중에서 보안 리포트의 중복 여부 판단에 이용할 적어도 하나의 데이터 컬럼을 선택할 수 있다.Additionally, the processor 110 may select at least one data column to be used to determine whether a security report is duplicated from among a plurality of data columns based on the obtained report target, report purpose, and report content.

도 8은 복수의 데이터 컬럼에 포함된 각 항목을 예시한 도면이다.Figure 8 is a diagram illustrating each item included in a plurality of data columns.

이때, 복수의 데이터 컬럼은 리포트 제목, 보안 취약점의 발견 위치, 범위, Attack point, payload, 공격 유형, 공격 영향, 보안 리포트를 작성한 평가자에 대한 평가, 취약점 설명, 조치 방안, 회사명, 프로그램명 및 제출일시 중 적어도 하나를 포함할 수 있다.At this time, multiple data columns include report title, security vulnerability discovery location, scope, attack point, payload, attack type, attack impact, evaluation of the evaluator who wrote the security report, vulnerability description, action plan, company name, program name, and At least one of the submission date and time may be included.

도 8을 참조하면, 위에서 언급한 복수의 데이터 컬럼 각각에 대한 예시가 도시되어 있다.Referring to FIG. 8, examples of each of the plurality of data columns mentioned above are shown.

도 9는 설명 가능한 인공지능 모델을 이용하여 보안 리포트를 요약하고, 이를 이용하여 중복 여부를 검사하는 것을 예시한 도면이다.Figure 9 is a diagram illustrating summarizing a security report using an explainable artificial intelligence model and checking for duplicates using this.

도 9를 참조하면, 프로세서(110)는 설명 가능한 인공지능 모델을 이용하여 보안 리포트의 리포트 대상, 리포트 목적 및 리포트 내용을 포함하도록 보안 리포트에 대한 요약하는 설명을 요청하고, 설명 가능한 인공지능 모델로부터 작성된 요약서를 획득한다.Referring to FIG. 9, the processor 110 requests a summary explanation of the security report to include the report target, report purpose, and report content of the security report using an explainable artificial intelligence model, and generates a summary explanation from the explainable artificial intelligence model. Obtain the completed summary.

그리고, 프로세서(110)는 설명 가능한 인공지능 모델로부터 획득한 요약서를 상기 기 설정된 적어도 하나의 검사 방법을 이용하여 검사할 수 있다. 프로세서(110)는 인공지능 모델을 이용하여 상기 획득한 요약서를 기반으로 기 설정된 적어도 하나의 검사 방법을 이용하여 검사하고, 검사 결과를 기반으로 보안 리포트의 중복 여부를 판단할 수 있다.Additionally, the processor 110 may inspect the summary obtained from the explainable artificial intelligence model using at least one preset inspection method. The processor 110 may use an artificial intelligence model to inspect the obtained summary using at least one preset inspection method and determine whether the security report is duplicated based on the inspection result.

프로세서(110)는 보안 리포트가 중복되는 것이라고 판단되는 경우, 중복 요소와 중복 정도를 기반으로 평가자에게 제공할 답변을 생성하기 위한 명령 프롬프트를 생성한다.If the processor 110 determines that the security report is duplicate, it generates a command prompt to generate an answer to be provided to the evaluator based on the duplicate elements and degree of overlap.

그리고, 프로세서(110)는 생성된 명령 프롬프트를 생성형 모델에 입력하고, 생성형 모델로부터 획득한 답변 데이터를 평가자에게 제공하여 중복 사유를 설명한다.Then, the processor 110 inputs the generated command prompt into the generative model and provides answer data obtained from the generative model to the evaluator to explain the reason for the duplication.

프로세서(110)는 보안 리포트가 중복되는 것이라고 판단되는 경우, 보안 리포트와 중복되는 내용의 문서에서 적어도 하나의 중복 요소를 추출할 수 있다.If it is determined that the security report is duplicated, the processor 110 may extract at least one duplicate element from the document whose content overlaps with the security report.

그리고, 프로세서(110)는 추출한 중복 요소 각각에 대하여 중복되는 정도를 의미하는 중복도를 산출할 수 있다. 이때, 전혀 중복되지 않는 경우 중복도가 0이고, 완전하게 동일한 경우 중복도가 100이라고 설정할 수 있다.Additionally, the processor 110 can calculate the degree of redundancy, which means the degree of overlap, for each extracted redundant element. At this time, if there is no overlap at all, the redundancy can be set to 0, and if they are completely identical, the redundancy can be set to 100.

따라서, 프로세서(110)는 보안 리포트의 리포트 대상, 리포트 목적, 리포트 내용, 그리고 중복 요소와 각 중복 요소에 대한 중복도를 기반으로 평가자에게 보안 리포트가 중복되었다고 설명하기 위한 답변을 생성하기 위한 명령 프롬프트를 생성하고, 이를 생성형 모델에 입력하여 답변을 획득한다.Accordingly, the processor 110 generates a command prompt to generate a response to explain to the evaluator that the security report is duplicated based on the report object of the security report, the purpose of the report, the report content, and the duplicate elements and the degree of redundancy for each duplicate element. Create and input it into the generative model to obtain an answer.

프로세서(110)는 획득한 답변을 통신부(120)를 통해 평가자 단말(200)로 제공한다.The processor 110 provides the obtained answer to the evaluator terminal 200 through the communication unit 120.

본 개시의 실시예에서, 프로세서(110)가 보안 리포트가 중복되는지 여부를 검사하는 과정은 이전의 문서에 보안 리포트와 유사한 내용의 문서가 존재하는지 여부를 검사하는 것을 의미할 수 있다.In an embodiment of the present disclosure, the process of the processor 110 checking whether a security report is duplicated may mean checking whether a document with similar content to the security report exists in a previous document.

보안 리포트 평가 장치(100)는 아래와 같은 방법들을 이용하여 중복 여부, 유사도를 산출할 수 있으며, 이외에도 보안 리포트의 중복 여부를 판단하고 유사도를 산출하기에 적합한 알고리즘이라면 무엇이든 적용이 가능하다.The security report evaluation device 100 can calculate duplication and similarity using the following methods. In addition, any algorithm suitable for determining duplication of security reports and calculating similarity can be applied.

일 실시예로, 프로세서(110)는 보안 리포트에서 n개의 해시함수를 적용하여 복수의 해시값을 구하고, n개의 해시함수 각각의 복수의 해시값 중 어느 하나를 특징값으로 선택하는 규칙을 갖는 특징함수를 이용하여 n개의 특징값으로 이루어진 특징벡터를 도출한다.In one embodiment, the processor 110 obtains a plurality of hash values by applying n hash functions in the security report, and has a rule for selecting one of the plurality of hash values for each of the n hash functions as a feature value. A feature vector consisting of n feature values is derived using a function.

프로세서(110)는 기 수집된 복수의 문서 각각에 대하여 상기 산출된 n개의 특징값이 각 테이블 내에서 중복되지 않는 특징값을 가지는 n개의 테이블로 이루어진 검출 데이터베이스를 저장한다. 이때, 기 수집된 복수의 문서는 사전이 리포트가 완료된 기존 보안 리포트가 해당될 수 있다.The processor 110 stores a detection database consisting of n tables in which the calculated n feature values for each of a plurality of previously collected documents have feature values that do not overlap within each table. At this time, the plurality of documents already collected may correspond to existing security reports for which a prior report has been completed.

프로세서(110)는 보안 리포트로부터 도출된 상기 n개의 특징값과 검출 데이터베이스에 대응하는 상기 n개의 테이블의 특징값을 각각 비교하여 중복되는 특징값의 비율에 따라 보안 리포트의 유사도를 산출할 수 있다.The processor 110 may compare the n feature values derived from the security report with the feature values of the n tables corresponding to the detection database, and calculate the similarity of the security report according to the ratio of overlapping feature values.

프로세서(110)는 보안 리포트의 리포트 대상, 리포트 목적 및 리포트 내용을 기반으로 보안 리포트에서 복수의 키워드를 추출한다.The processor 110 extracts a plurality of keywords from the security report based on the report target, report purpose, and report contents of the security report.

프로세서(110)는 보안 리포트의 리포트 대상, 리포트 목적 및 리포트 내용과 관련된 복수의 제1 키워드를 추출할 수 있다.그리고, 프로세서(110)는 제1 키워드를 기반으로 보안 리포트 내에서 제1 키워드와 기 설정된 유사도 이상의 유사도를 갖는 복수의 제2 키워드를 추출한다.The processor 110 may extract a plurality of first keywords related to the report target, report purpose, and report content of the security report. Then, the processor 110 may extract the first keyword and the first keyword within the security report based on the first keyword. A plurality of second keywords having a similarity greater than or equal to a preset similarity are extracted.

프로세서(110)는 제1 키워드에 제1 가중치를 부여하고, 제2 키워드에 유사도를 기반으로 제2 가중치를 부여한다.The processor 110 assigns a first weight to the first keyword and a second weight to the second keyword based on similarity.

프로세서(110)는 사전에 리포트된 문서(보안 리포트)가 저장된 데이터베이스를 검색하여 제1 키워드 및 제2 키워드와 중복되거나 기 설정된 수준 이상의 유사도를 갖는 문서를 검색한다.The processor 110 searches a database in which previously reported documents (security reports) are stored and searches for documents that overlap with the first keyword and the second keyword or have a similarity level higher than a preset level.

프로세서(110)는 대상 문서가 존재하는 것으로 검색되는 경우, 제1 키워드 및 제2 키워드 중에서 대상 문서와 중복되거나 기 설정된 수준 이상의 유사도를 갖는 키워드를 선택한다. 프로세서(110)는 선택된 키워드 각각에 대하여 부여되어 있는 가중치를 기반으로 최종 유사도를 산출한다.When it is searched that a target document exists, the processor 110 selects a keyword that overlaps with the target document or has a similarity higher than a preset level from among the first keyword and the second keyword. The processor 110 calculates the final similarity based on the weight assigned to each selected keyword.

프로세서(110)는 산출된 최종 유사도를 기반으로 보안 리포트의 중복 여부를 판단한다.The processor 110 determines whether the security report is duplicated based on the calculated final similarity.

이상에서 전술한 본 개시의 일 실시예에 따른 방법은, 하드웨어인 서버와 결합되어 실행되기 위해 프로그램(또는 어플리케이션)으로 구현되어 매체에 저장될 수 있다.The method according to an embodiment of the present disclosure described above may be implemented as a program (or application) and stored in a medium in order to be executed in combination with a server, which is hardware.

상기 전술한 프로그램은, 상기 컴퓨터가 프로그램을 읽어 들여 프로그램으로 구현된 상기 방법들을 실행시키기 위하여, 상기 컴퓨터의 프로세서(110, (CPU)가 상기 컴퓨터의 장치 인터페이스를 통해 읽힐 수 있는 C, C++, JAVA, 기계어 등의 컴퓨터 언어로 코드화된 코드(Code)를 포함할 수 있다. 이러한 코드는 상기 방법들을 실행하는 필요한 기능들을 정의한 함수 등과 관련된 기능적인 코드(Functional Code)를 포함할 수 있고, 상기 기능들을 상기 컴퓨터의 프로세서(110)가 소정의 절차대로 실행시키는데 필요한 실행 절차 관련 제어 코드를 포함할 수 있다. 또한, 이러한 코드는 상기 기능들을 상기 컴퓨터의 프로세서(110)가 실행시키는데 필요한 추가 정보나 미디어가 상기 컴퓨터의 내부 또는 외부 메모리(130)의 어느 위치(주소 번지)에서 참조되어야 하는지에 대한 메모리(130) 참조관련 코드를 더 포함할 수 있다. 또한, 상기 컴퓨터의 프로세서(110)가 상기 기능들을 실행시키기 위하여 원격(Remote)에 있는 어떠한 다른 컴퓨터나 서버 등과 통신이 필요한 경우, 코드는 상기 컴퓨터의 통신 모듈을 이용하여 원격에 있는 어떠한 다른 컴퓨터나 서버 등과 어떻게 통신해야 하는지, 통신 시 어떠한 정보나 미디어를 송수신해야 하는지 등에 대한 통신 관련 코드를 더 포함할 수 있다.The above-mentioned program is C, C++, JAVA that the processor (110, (CPU)) of the computer can read through the device interface of the computer in order for the computer to read the program and execute the methods implemented in the program. , This code may include code coded in a computer language such as machine language, and may include functional code related to functions that define the necessary functions for executing the methods. In addition, these codes may include control codes related to execution procedures necessary for the computer's processor 110 to execute the functions according to predetermined procedures, and may include additional information or media required for the computer's processor 110 to execute the functions. Additionally, the computer's processor 110 may further include a memory 130 reference-related code indicating which location (address address) of the computer's internal or external memory 130 should be referenced. If communication with any other remote computer or server is required for execution, the code must determine how to communicate with any other remote computer or server using the computer's communication module, and what information or media is used during communication. It may further include communication-related codes such as whether to transmit or receive.

상기 저장되는 매체는, 레지스터, 캐쉬, 메모리(130) 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상기 저장되는 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있지만, 이에 제한되지 않는다. 즉, 상기 프로그램은 상기 컴퓨터가 접속할 수 있는 다양한 서버 상의 다양한 기록매체 또는 평가자의 상기 컴퓨터상의 다양한 기록매체에 저장될 수 있다. 또한, 상기 매체는 네트워크로 연결된 컴퓨터 시스템(10)에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장될 수 있다.The storage medium refers to a medium that stores data semi-permanently and can be read by a device, rather than a medium that stores data for a short period of time, such as a register, cache, or memory 130. Specifically, examples of the storage medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc., but are not limited thereto. That is, the program may be stored in various recording media on various servers that the computer can access or on various recording media on the evaluator's computer. Additionally, the medium may be distributed in a computer system 10 connected to a network, and computer-readable code may be stored in a distributed manner.

본 개시의 실시예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, CD-ROM, 또는 본 개시가 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터 판독가능 기록매체에 상주할 수도 있다.The steps of the method or algorithm described in connection with the embodiments of the present disclosure may be implemented directly in hardware, implemented as a software module executed by hardware, or a combination thereof. The software module may be RAM (Random Access Memory), ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), Flash Memory, hard disk, removable disk, CD-ROM, or It may reside on any type of computer-readable recording medium well known in the art to which this disclosure pertains.

이상, 첨부된 도면을 참조로 하여 본 개시의 실시예를 설명하였지만, 본 개시가 속하는 기술분야의 통상의 기술자는 본 개시가 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다.Above, embodiments of the present disclosure have been described with reference to the attached drawings, but those skilled in the art will understand that the present disclosure can be implemented in other specific forms without changing its technical idea or essential features. You will be able to understand it. Therefore, the embodiments described above should be understood in all respects as illustrative and not restrictive.

10: 인공지능 기반의 보안 리포트 평가 시스템
100: 보안 리포트 평가 장치
110: 프로세서
120: 통신부
130: 메모리
200: 평가자 단말10: Artificial intelligence-based security report evaluation system
100: Security report evaluation device
110: processor
120: Department of Communications
130: memory
200: Evaluator terminal

Claims

Ministry of Communications;
a memory storing at least one instruction;
Contains a processor,
The processor executes the at least one instruction,
Receive a security report through the communication unit,
Using an artificial intelligence model, the received security report is inspected using at least one preset inspection method,
Based on the inspection results, determine whether the received security report is duplicated,
Extracting a plurality of keywords from the security report based on the report target, report purpose, and report contents of the security report,
Obtaining the report target, report purpose, and report content of the security report that explains the security report using an explainable artificial intelligence model,
Characterized in selecting at least one data column to be used in determining whether the security report is duplicated from among a plurality of data columns based on the obtained report target, report purpose, and report content.
Artificial intelligence-based security report evaluation device.

According to paragraph 1,
The processor,
Summarize the contents of the received security report using an artificial intelligence model,
Calculate the semantic similarity of the above summarized contents,
Characterized in calculating the degree of duplication of the received security report based on the result of calculating the similarity,
Artificial intelligence-based security report evaluation device.

According to paragraph 1,
The processor,
Characterized by calculating similarity by performing an STS (Semantic Textual Similarity) task based on an artificial intelligence model, applying additional weight according to the calculated similarity using a loss function, and then calculating the degree of overlap.
Artificial intelligence-based security report evaluation device.

According to paragraph 1,
The processor,
Characterized by increasing the data amount and diversity of the received security report using data augmentation techniques,
Artificial intelligence-based security report evaluation device.

According to paragraph 1,
The processor,
Translating the security report written in the first language into at least one other language,
Translating the security report translated into the other language into the first language to generate a re-translated security report,
Characterized in that the re-translated security report is inspected using at least one preset inspection method using the artificial intelligence model,
Artificial intelligence-based security report evaluation device.

delete

According to paragraph 1,
The plurality of data columns are:
Includes report title, security vulnerability discovery location, scope, attack point, payload, attack type, attack impact, evaluation of the user who created the security report, vulnerability description, action plan, company name, program name, and submission date.
Artificial intelligence-based security report evaluation device.

According to paragraph 1,
Requesting a summary explanation of the security report using the explainable artificial intelligence model to include the report target, report purpose, and report content of the obtained security report,
Characterized in that the summarized description obtained from the explainable artificial intelligence model is inspected using the at least one preset inspection method,
Artificial intelligence-based security report evaluation device.

In a manner performed by the device,
Receiving a security report through a communication unit;
inspecting the received security report using at least one preset inspection method using an artificial intelligence model; and
A step of determining whether the received security report is a duplicate based on the result of the inspection,
The device is,
Extracting a plurality of keywords from the security report based on the report target, report purpose, and report contents of the security report,
Obtaining the report target, report purpose, and report content of the security report that explains the security report using an explainable artificial intelligence model,
Characterized in selecting at least one data column to be used in determining whether the security report is duplicated from among a plurality of data columns based on the obtained report target, report purpose, and report content,
Artificial intelligence-based security report evaluation method.

A computer-readable recording medium combined with a hardware computer and storing a program for executing the method of claim 9.